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Sir: 

This Reply Brief is filed in response to the Examiner's Answer, mailed 8/28/01. 



Status of Amendments 

An Amendment After Final Action was filed October 29, 2001 to cancel subject matter 
from claims 73 and 81. Applicants note that U.S. Patent No. 6,063,596 has issued from the PCT 
application cited by Applicants during prosecution (Int'l. Pub. No. WO 99/29849; see paper 8 
including IDS initialed by Examiner in Office Action mailed 8/25/00). Because the claims of 
this issued patent read on the proposed claims of the present application, Applicants filed the 
amendment in order to reduce issues on appeal. A copy of the claims prior to this amendment 
can be found attached to the Appeal Brief filed in this case on July 10, 2001; the claims attached 
hereto are shown as if the changes suggested in the Amendment After Final Action had been 
entered. 



Introduction 

The Examiner has raised several new arguments in the Examiner's Answer and has 
renewed grounds for the rejection that were previously overcome by the Appellants. In the final 
Office Action, the Examiner states, "Applicants have not clearly demonstrated that the cloned 
nucleic acid and its encoded polypeptide is actually a GPCR as was noted in the utility rejection," 
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but "Applicants do indeed provide multiple well established and specific utilities for a GPCR." 
See, Office Action dated February 12, 2001, page 3. In the Examiner's Answer, the Examiner 
agreed with Appellants' statement of the issues in the Appeal Brief that the single issue was 
whether 2871 is a GPCR (Examiner's Answer, page 2). However, the Examiner proceeds to set 
forth a new ground for the utility rejection, indicating that even if Applicants establish 2871 as a 
GPCR, members of the GPCR family of polypeptides do not have well-established utility. See, 
Examiner's Answer, pages 4-5. 

The Examiner also cites a new reference in the Examiner's Answer in support of the 
argument that a protein's sequence may not be used to predict its function (Attwood (2000) 
Science 290:41 7-473). The Examiner's Answer further includes citations to two references that 
were cited in the first Office Action as a grounds for rejection under 35 U.S.C. § 101, but that 
were not cited in the final Office Action. The fact that these references were not cited in the final 
Office Action led Appellants to believe that the rejection had been overcome to the extent that it 
was based on these references, and therefore the Examiner's arguments were not addressed in 
detail in the Appeal Brief 

Appellants note that this change in direction by the Examiner, which was not explained, 
is a practice which makes patent prosecution more difficult. This practice serves to obscure the" 
basis for the rejection and runs the risk of unfairly prejudicing applicants' nascent property rights 
in their patentable subject matter. As stated by the Federal Circuit in In re Oetiker, "[t]he 
examiner cannot sit mum, leaving the applicant to shoot arrows in the dark hoping to somehow 
hit a secret objection harbored by the examiner." 977 R2d 1443, 24 USPQ2d 1443, 1447 (Fed. 
Cir. 1992) (Plager, J., concurring). 

Because the Examiner previously admitted that GPCRs have "multiple well-established 
and specific utilities," Appellants did not fully address the utility of GPCRs in the Appeal Brief. 
It is requested that the rejection be withdrawn or prosecution be reopened to give Appellants a 
fair opportunity to respond to the new and renewed grounds of rejection. However, should the 
rejection not be withdrawn and prosecution not be reopened, Applicants here present these 
arguments in response to the Examiner's new and revived grounds of rejection. Responses to the 
Examiner's new and revived arguments are addressed below in section A, while the issue of 
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utility of the present invention is discussed below in section B. 
Argument 

2871 Encodes a G-Protein Coupled Receptor 

A. The evidence presented by the Examiner does not address the methods used by the 
Appellants to determine 2871 receptor function. 

1. Berendsen is directed to the de novo prediction of protein tertiary 
structure from primary structure, not the prediction of protein function based on the 
presence of conserved functional domains. 

In the first Office Action and the Examiner's Answer, the Examiner cited Berendsen et 
al. (1998) Science 282:642-643 in support of the argument that protein activity predictions based 
on functional domains are unpredictable. Because this reference was not cited in the final Office 
Action, Appellants believed that the rejection had been overcome to the extent that it was based 
on this reference. However, the Examiner has cited Berendsen in the Examiner's Answer and 
thus it will be addressed here. 

The teachings of Berendsen are directed to methods of predicting a protein's tertiary 
structure from its primary sequence. Berendsen states, fl [t]he prediction of the native 
conformation of a protein of known amino acid sequence is one of the great open questions in 
molecular biology and one of the most demanding challenges in the new field of bioinformatics," 
and then proceeds to discuss computer simulations of protein folding. See Berendsen at 642. In 
the Examiner's Answer, the Examiner appears to acknowledge that Berendsen is not directed to 
the functional domain based predictions of protein function utilized by Appellants, but notes that 
"the activity of any protein or polypeptide is dependent on its structure." (Examiner's Answer, 
page 9) While Appellants agree that some regions of a protein must retain a certain 
conformation in order for the protein to be active, it does not follow that a protein's tertiary 
structure must be known in order to determine the activity of that protein. In fact, three- 
dimensional structures have been elucidated for only a very few of the thousands of proteins 
having known biochemical or physiological activity. Accordingly, the caveats regarding 
predictions of tertiary structure found in Berendsen are not relevant to methods for predicting 
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protein function used by the Appellants. 

2. Galperin is directed to context-based methods of predicting protein 
function, not to predictions of protein function based on the presence of functional 
domains. 

In the first office action, the Examiner cited Galperin et al. (2000), Nature Biotechnology 
18:609-613, in support of the argument that a protein's function cannot be predicted from the 
presence of conserved functional domains. This reference was not cited in the final Office 
Action, leading Appellants to believe that the rejection was overcome to the extent that it was 
based on this reference. However, the Examiner has cited the reference in the Examiner's 
Answer and thus it will be addressed here. 

The teachings of Galperin are directed to the prediction of protein function using 
comparative genomic approaches. The abstract for the Galperin reference states, !, [s]everal 
recently developed computational approaches in comparative genomics go beyond sequence 
comparison. By analyzing phylogenetic profiles of protein families, domain fusions, gene 
adjacency in genomes, and expression patterns, these methods predict many functional 
interactions between proteins and help deduce specific functions for numerous proteins." The 
authors then proceed to discuss the strengths and weaknesses of these genomic context-based 
methods of functional prediction. Accordingly, the primary teachings of Galperin are not 
directed to the methods used by the Appellants to predict 2871 function. 

In rebutting Appellants' arguments, the Examiner quotes Galperin: "sequence 
comparison methods, even the best ones, are of little help when a protein has no homo logs in 
current databases or when all hits are to uncharacterized gene products." While Appellants agree 
that sequence similarity with uncharacterized gene products cannot be used to determine a 
protein's activity, this caveat does not apply to determine the function of the 2871 receptor. In 
the present case, the function of the 2871 receptor was determined based on the presence of 
sequence similarity with a conserved functional domain characteristic of the rhodopsin family of 
GPCR's. As described fully in Appellants' Appeal Brief and illustrated in Appendix E of the 
same, this conserved functional domain was elucidated from the sequences of a number of 
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rhodopsin-family GPCR's having known biochemical activities. Accordingly, Galperin's 
statement that "sequence comparison methods. . .are of little help. . .when all hits are to 
uncharacterized gene products" is true but does not undermine the reliability of the methods of 
functional prediction used by the appellants. 

The only additional teachings that Galperin provides regarding prediction of protein 
function based on sequence similarity with proteins of known function are also supportive of the 
strength and reliability of the methods used in the present application. Galperin states (page 613, 
column 1) that comparative genomic methods of predicting protein function discussed in the 
reference "provide a useful extension of, and in a sense a genome-based framework for, sequence 
and structural methods which remain the cornerstone of computational genomics." Thus, 
Galperin distinguishes between the comparative genomics-based methods of functional 
prediction reviewed in the reference and the pattern based methods for functional prediction used 
by the Appellants, and further demonstrates that the authors consider the approach used by 
Appellants to be reliable. 

3. Attwood distinguishes between the reliability of module-based prediction 
of protein function and pattern-based prediction of protein function and presents 
arguments supporting the diagnostic reliability of pattern databases. 

The Examiner's Answer includes a citation to a new reference (Attwood (2000) Science 
290:471-473) in support of the argument that sequence similarity cannot be used to predict 
protein function. Specifically, the Examiner cites the statement: M [i]f the best hit in a database 
search is a match to a single domain module, it is unlikely that the function annotation can be 
propagated from the parent protein to the query sequence," and "[t]he presence of a module tells 
little of the function of the complete system; knowing most of the components of a mosaic does 
not allow us easily to predict a missing one , and modules in different proteins do not always 
perform the same function." Attwood (2000) Science 290 at page 472, column 2. A careful 
reading of the Attwood reference makes it clear that these statements refer not to the prediction 
of protein function based on the presence of a conserved functional domain, but rather to the 
prediction of function based on the presence of a single motif or module. Such modules are 
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defined by Attwood as "autonomous folding units that often function as protein building blocks, 
forming multiple combinations of the same module or mosaics of different modules. 1 ' Id. In the 
present case, Appellants have determined the function of the 2871 receptor based on the fact that 
255 contiguous amino acids of the 2871 polypeptide provide an excellent fit to the empirically- 
derived model of the GPCR family that includes rhodopsin. This statistical model is not solely 
based on the presence of a single autonomous folding unit. 

The differences between the reliability of motif or module-based methods of protein 
function prediction and functional domain-based methods of function prediction are discussed in 
greater detail in Attwood (2000) Int. J. Biochem. Cell Biol 32:139-155 ("IJBCB" provided as 
Appendix G), a more comprehensive review article published by Attwood in the same year as the 
reference cited by the examiner. In this reference, Attwood teaches that while functional 
prediction methods based on the presence of a single motif may be problematic because matches 
to single motifs lack biological context (see Attwood, IJBCB at 144), many of the flaws inherent 
in these single motif-based methods are overcome in pattern databases such as Pfam. Attwood 
states: 

"[pjattern databases offer several benefits: (i) by distilling multiple 
sequence information into family descriptors, trivial errors in the 
underlying sequences may be diluted; (ii) annotation errors may be 
quickly spotted if the description of one sequence differs from that 
of its family; and (iii) they allow specific diagnoses, placing 
individual sequences in a family context for a more informed 
assessment of possible function." 

Attwood, IJBCB at 153. 

Attwood also teaches the diagnostic advantages of manually-generated databases such as 
Pfam (which is based on hand-edited seed alignments; see Attwood, IJBCB at 149). Attwood 
states, "manually annotated databases are set apart from their automatically created counterparts 
by virtue of (i) providing validation of results and (ii) offering detailed information that helps to 
place conserved sequence information in structural or functional contexts." (Attwood, IJBCB. at 
152). Attwood further states that while pattern databases are small in comparison with sequence 
respositories, "their diagnostic potency ensures that pattern databases will pay an increasingly 
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important role as the post-genome quest to assign functional information to raw sequence data 
gains pace." (Attwood, IJBCB at pp. 1 53-1 54) Thus the teachings by Attwood regarding pattern 
databases, particularly manually-generated pattern databases, are strongly supportive of the 
reliability of these techniques. 

Thus, the Examiner seizes on a single brief review article by Attwood about caveats of 
sequence comparison methods to discredit sequence comparison methods in general (Examiner's 
Answer, page 7, "protein function cannot be ascertained from analysis of its components.") 
Applicants agree generally with Attwood's argument in the new reference cited by the Examiner 
that predictions of protein function based on a single motif are not necessarily reliable. However, 
those of skill in the art distinguish between the presence of a single motif in a protein and the 
presence of configurations of multiple motifs, or a pattern, which is diagnostic of a particular 
protein family. 

Attwood has published a number of articles describing patterns that are diagnostic of G- 
protein coupled receptors 1 , and is known as one of the creators of the PRINTS sequence 
comparison method and database. Perhaps most pertinent here is an article published by 
Attwood after the article cited by the Examiner, entitled: "A compendium of specific motifs for 
diagnosing GPCR subtypes." Attwood (2001) TRENDS in Pharmacological Sciences 22(4): 
162-165 ("TiPS" provided as Appendix H). In this article, Attwood discusses the differences 
between several sequence comparison methods and describes the use of her PRINTS methods 
and database for the analysis of GPCRs (available at http://bioinf.man.ac.uk/cgi- 
bin/dbbrowser/fingerPRJNTScan/muppet/FPScan.cgi , as indicated in Figure 1). See Attwood, 
77PSat 164. 

A PRINTS analysis of the closest publicly disclosed polypeptide sequence to the subject 
of the present application {i.e., the sequence disclosed in U.S. Patent No. 6,063,596 as SEQ ID 
NO: 3) shows an identification of the "GPCRRHODOPSN" fingerprint, with an E-value of 3.1e" 

1 Attwood's work includes: Attwood and Beck (1994), Protein Eng. 7(7): 841-848, entitled "PRINTS— a protein 
motif fingerprint database"; Attwood and Findlay (1994) Protein Eng. 7(2): 195-203, entitled "Fingerprinting G- 
protein Coupled Receptors"; Attwood etal. (1991) Gene 98(2): 153-159, entitled "Multiple Sequence Alignment of 
Protein Families Showing Low Sequence Homology: A Methodological Approach Using Database Pattern-matching 
Discriminators for G-protein-Iinked Receptors." 
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29 and a P-value of 1.2e~ 34 (see output, attached as Appendix I). As indicated in the 

documentation for PRINTS also available at this site, "[t]he reported P-value of any fingerprint 

result is the product of the p-values for each motif. The motif p-values represent the probability 

that a comparison between the motif and a random sequence would achieve a score greater than 

or equal to the score attributed to the match between your query sequence and the motif" The E- 

value is the expected number of occurrences of sequences scoring greater than or equal to the 

query's score. Thus, the very low P-value and E-value obtained from Attwood's PRINTS 

analysis concurs with the Pfam diagnosis described by Applicants that the 2871 sequence is a 

GPCR. Accordingly, the Examiner's use of Attwood to discredit sequence comparison methods 

in general is inconsistent with Attwood' s work, which strongly supports the conclusion that the 

2871 sequence is a GPCR. 

4. The Examiner 's failure to credit the predictive power of sequence 
comparison methods is at odds with accepted practice in the art 

The Examiner notes (Examiner's Answer, paper number 18 mailed 8/28/01, page 3) that 
"[mjoreover, the specification discloses that the cloned GPCR shares a high score with the seven 
transmembrane rhodopsin family," and further states on page 4 that "the specification notes that 
proteins with putative seven transmembrane domains, much like applicants, are not necessarily 
GPCRs such as boss and fz cloned from Drosophila." The Examiner also states (Examiner's 
Answer, page 6-7) that "Figure 2 provides for only the DRY triplet and low sequence 
homology." Based partly on this line of reasoning, the Examiner asserts that the specification 
lacks "a specific and substantial utility [and] a well established utility." 

This line of reasoning by the Examiner is inconsistent with the understanding of one of 
skill in the art of Pfam alignments, and of sequence comparisons in general. As known to those 
of skill in the art (and described in the Pfam documentation available at 
http://pfam.wustl.edu/faq . shtml) , Pfam alignments do not display homology between pairs of 
sequences but rather display the fit of a particular query sequence to a particular protein family 
model. As discussed on the Pfam "Help Page:FAQ" available at the address above, complaints 
[like the Examiner's present complaint] about the quality of the alignments generally arise 
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"because people aren't used to looking at multiple alignments of hundreds or thousands of 
sequences. Remember that a rare insertion in even just one sequence [in the protein family] 
means having to open a gap in the whole alignment: Pfam full alignments look very gappy for 
this reason, but in fact they're not." 

The Examiner also ignores that boss (bride of sevenless) and fz (frizzled) show low 
similarities to GPCR domains in Pfam alignments. One of skill in the art understands that Pfam 
alignments of boss and frizzled with the highest-scoring seven transmembrane domain models for 
each (7tm_3 and 7tm_2, respectively) have negative scores. In contrast, the 2871 sequence has a 
high positive score for the rhodopsin subfamily that is described by Pfam model 7tm_l. Pfam 
"bit scores" represent the log base 2 of a ratio. In the numerator of this ratio is the probability of 
the sequence given the hypothesis that the sequence belongs to the protein family being modeled. 
In the denominator of this ratio is the probability of the sequence given the hypothesis that the 
sequence was generated according to a random background model. Thus, the bit score of 183 for 

1 83 

protein 2871 with the Pfam 7tm_l model means this protein sequence is 2 times more likely to 
be observed if it were generated by the 7tm_l model than if the sequence were generated by the 
other model. We note that 2 183 (about 1.2 x 10 55 ) greatly exceeds the estimated number of atoms 
comprised by the planet Earth. In contrast, the optimal score for boss to a GPCR family is -53, 
and the optimal score for frizzled is even lower, at -112. In other words, the sequence of boss is 
2 53 times more likely to be observed if it were generated by the random background model than if 
it were generated by the best-fitting GPCR model. Although 2 53 does not exceed the estimated 
number of atoms that are comprised by the planet Earth, we note that 2 53 is an extremely large 
number (about 9 x 10 15 ). Thus, contrary to the Examiner's arguments, the fact that the boss and 
frizzled proteins have seven transmembrane domains does not detract from Applicants' evidence 
that the sequences of the present invention are GPCRs. 

The Examiner has attacked Applicants' use of sequence comparison methods by quoting 
caveats largely out of context. As one of skill in the art is aware, any methodology is fallible to 
some degree and there are always exceptions to a rule; thus, most if not all articles describing 
sequence comparison methods also discuss the shortcomings of those methods. The Examiner 
seizes on these caveats to discredit the use of sequence comparison methods. The Examiner's 
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approach is at odds with that of the art, which has embraced sequence comparison methods, 
particularly as those methods have advanced in sophistication with the rapid advances of the 
genomic era. 

A brief survey of PubMed (accessible at http://www.ncbi.nlm.nih.gov/) shows dozens of 

peer-reviewed, scientific articles published every month describing novel discoveries of 

sequences having strong identity to sequences of known function. The acceptance of sequence 

comparison methods by the art is evidenced in many places. For example, Mount (2001) 

Bioinformatics: Sequence and Genome Analysis (Cold Spring Harbor Laboratory Press, Cold 

Spring Harbor, New York), page 282 (provided as Appendix J) states that "[djatabase similarity 

searches have become a mainstay of bioinformatics." Mount goes on to explain that, "[a]s a 

rough rule, if more than one-half of the amino acid sequence of query and database proteins is 

identical in the sequence alignments, the prediction is very strong. As the degree of similarity 

decreases, confidence in the prediction also decreases. The programs used for these database 

searches provide statistical evaluations that serve as a guide for evaluation of the alignment 

scores." As noted by Gusfield (1997) Algorithms on Strings, Trees, and Sequences: Computer 

Science and Computational Biology (Cambridge University Press, New York, New York), at 

pages 2 1 2-2 1 3 (provided as Appendix K), 

[sjequence comparison, particularly when combined with the 

systematic collection, curation, and search of databases containing 

biomolecular sequences, has become essential in modern molecular 

biology. * * * The first fact of biological sequence analysis: In 

biomolecular sequences (DNA, RNA, or amino acid sequences), high 

sequence similarity usually implies significant functional or structural 

similarity. Evolution reuses, builds on, duplicates, and modifies 

"successful" structures (proteins, exons, DNA regulatory sequences, 

morphological features, enzymatic pathways, etc), 
* * * 

4 Today, the most powerful method for inferring the biological function 
of a gene (or the protein that it encodes) is by sequence similarity 
searching on protein and DNA sequence databases. With the 
development of rapid methods for sequence comparison, both with 
heuristic algorithms and powerful parallel computers, discoveries 
based solely on sequence homology have become routine.' [citation 
omitted] * * * It is now standard practice, whenever a new gene is 
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cloned and sequenced, to translate its DNA sequence into an amino 
acid sequence and then search for similarities between it and members 
of the protein databases." 

Another indicator of the importance of sequence comparison methods to the "new 
paradigm" of modem molecular biology is the fact that the most-cited paper of 1990-1998 is the 
publication describing BLAST: Altschul (1990) 1 Mol Biol 215: 403, entitled "Basic Local 
Alignment Search Tool." (citation figures available at http://www.isinet.com/isi/hot/research) 
(provided as Appendix L). Accordingly, the Examiner's efforts to discredit sequence comparison 
methods in general is inconsistent with the art, which supports the use of sequence comparison 
methods and thus the conclusion that 2871 is a GPCR. 

5. Despite the fact that the histamine receptor family is divergent, members of 
these families were identified as GPCRs based on sequence similarity with known 
GPCRs. 

In the Appeal Brief, Appellants cited Nguyen et al (2001) Mol Pharmacol 59:427-433 
which describes the identification of the histamine receptor H4 based on sequence similarity with 
known GPCRs. In response, the Examiner has cited the teaching by Nguyen that the histamine 
receptors Hi, H2, and H3 share less than 35% identity with one another and each has greater 
identity with other aminergic receptors. This statement by Nguyen supports rather than discredits 
the reliability of the methods of functional prediciton used by the Appellants, as it demonstrates 
that the activity (in this case the G-protein mediated signal transduction activity) of a protein can 
be predicted based on sequeace identity of less than 35%. 

Despite the fact that histamine receptors share only moderate sequence identity with each 
other, the Hi, H2, H3, and H4 receptors were each recognized as being a G-protein coupled 
receptor having G-protein mediated signal transduction activity based on sequence identity. For 
example, Yamashita et al (1991) Biochem. 88:1 1515-1 1519 (provided as Appendix M) describe 
the cloning of the Hi receptor and note that "[t]he histamine Hi receptor is highly similar to other 
G protein-coupled receptors." Yamashita at 11518. Similarly, Gantz et al (1991) Proc. Natl 
Acad. Sci. 88:429-433 (provided as Appendix N) describe the cloning of the H2 receptor and 
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note that "comparison of the deduced amino acid sequence to that of other G-protein-linked 
receptors with presumed seven-transmembrane motifs revealed extensive homology." The H3 
histamine receptor was identified and cloned based on a high degree of sequence similarity with 
biogenic amine GPCRs (Lovenberg et al (1999) Mol Pharmacol. 55:1 101-1 107, provided as 
Appendix O). Finally, as described in the Appeal Brief, Nguyen et al describe the cloning of the 
H4 receptor based on a query of GenBank to identify sequences sharing sequence similarity with 
GPCRs. Thus, the G-protein mediated signal transduction activity of all of the histamine 
receptors was accurately predicted based on sequence similarity with known GPCRs. 
Accordingly, the Examiner's attempt to use Nguyen to discredit functional prediction methods is 
misguided. 

6. The tumor suppressor activity of p73 was predicted based on sequence 
identity with the known tumor suppressor p53. 

In the Appeal Brief, Appellants cite Dickman et al (1997) Science 277:1605-1606, which 
teaches that the tumor suppressor activity of the p73 polypeptide was determined based on 
sequence similarity with the transcription activation, DNA-binding, and oligomerization domains 
of the known tumor suppressor protein p53. In response, the Examiner argues that Dickman 
teaches that the p73 gene is deleted in certain cancers. However, a careful reading of Dickman 
finds that the original determination of p73 protein's tumor suppression activity was made on the 
basis of sequence similarity alone. Dickman teaches that p73 was identified in a screen for genes 
that respond to certain immune system regulators. Dickman states, ,! [w]hen the French team 
sequenced the many potential targets their screen had turned up, they were shocked to find out 
that one false positive had remarkable similarities to p53." Dickman at 1605. It was only after 
p73 f s tumor suppression activity had been predicted on the basis of sequence similarity with p53 
that the investigators thought to look for alterations in the p73 gene in cancer patients. Thus, 
Dickman is an excellent example of the value of sequence comparison-based approaches to 
discovery of new genes. 
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7. Kliewer demonstrates the successful identification of novel nuclear 
receptors based on sequence similarity with the ligand-binding domain of known nuclear 
receptors. 

In the Appeal Brief, Appellants cite Kliewer et al (1998) Cell 92(1): 73-82 as an 
additional example of the accurate determination of a protein's function based on the presence of 
functional domains. The Examiner seeks to discredit this argument by noting that the PXR. 1 
amino acid sequence is identical to the PXR.2 amino acid sequence except for a 41 amino acid 
deletion resulting from alternative splicing. This statement misses the point of the reference, 
which does not teach the isolation of the PXR.2 coding sequence based on the PXR.l coding 
sequence but instead describes the cloning of both the PXR.l and PXR.2 coding sequences based 
on sequence identity with motifs characteristic of known nuclear receptors. Kliewer states, "[i]n 
an effort to identify new member of the nuclear receptor family, we performed a series of motif 
searches of public EST databases. These searches revealed a clone . . . that had homology to the 
ligand-binding domain of a number of nuclear receptors." Kliewer at 74. Kliewer teaches that 
this EST was then used to clone the nuclear receptor PXR.l and its splice variant PXR.2. Thus, 
Kliewer describes yet another successful use of sequence similarity with functional domains to 
predict protein function. 

B. The evidence presented by Applicants supports a finding that the present 
invention satisfies the requirement of utility. 

Applicants again note that these arguments are presented for the first time on appeal 
because the Examiner earlier indicated that the only issue was whether the disclosed sequence 
actually was a GPCR. Now, the Examiner asserts that even if the disclosed sequences are 
GPCRs, utility is not established. Because the Examiner has changed the utility rejection, 
Applicants have not had the opportunity to fully address the Examiner's arguments. Applicants 
here present these arguments in response to the Examiner's new and revived grounds of 
rejection. 
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1. The 2871 receptor is useful in selectivity screening and therefore has a 
"well-established" utility. 

The Examiner has rejected claims 73, 74, 81, and 88-96 under 35 U.S.C. §101 on the 
grounds that the claimed invention "lacks patentable utility." (Feb. 12, 2001 Office action page 
3). This does not correctly reflect the view in the art, where it is known that "[historically, the 
superfamily of GPCRs has proven to be among the most successful drug targets and 
consequently these newly isolated orphan receptors have great potential for pioneer drug 
discovery." Stadel et al. (1997) Trends Pharmacol. Sci. 18:430-436; provided as Appendix P). 
Those of skill in the art recognize that the identification of a novel member of the G-protein 
coupled receptor family provides an immediate benefit. In addition to serving as reagents and 
targets in the diagnosis and treatment of 2871 -mediated disorders as described in the 
specification on page 48 et seq., all members of the GPCR protein family have utility in 
selectivity screening of candidate drugs that target GPCRs. It is known in the art that the clinical 
usefulness of a therapeutic compound is determined not only by its ability to bind and modulate a 
molecular target of interest, but also by its selectivity. Drugs that bind selectively to their 
molecular target are highly preferred over those that bind to structurally-related molecules, as the 
selective compounds are far less likely to have unwanted side effects in clinical use. See, for 
example, Hartig (1993) NIDA Res. Monogr. 134: 58-65, entitled, "The use of cloned human 
receptors for drug design," provided as Appendix Q; Fraser (1995) Nucl. Med. 36 (6 Suppl): 
17S-21S, provided as Appendix R. Thus, an important component of any drug development 
strategy is determining the selectivity of the candidate daig for the molecular target of interest 
over structurally-related polypeptides. The effectiveness of selectivity screening in uncovering 
interactions that may result in undesirable clinical side-effects increases in proportion with the 
number of structurally-related polypeptides screened. In this situation, the usefulness of these 
structurally-related polypeptides is not dependent on their biological role or ligand-binding 
properties; their utility comes from the fact that they share significant sequence identity with the 
molecular target of the candidate drug. 

An example of the use of orphan receptors in selectivity screening is found in Goodwin et 
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al (2000) Molecular Cell 6:517-526, provided as Appendix S. This reference is directed to the 
identification of a specific agonist for FXR, an orphan nuclear receptor that regulates bile acid 
synthesis and is a target in the treatment of cholestasis. (See generally, Niesor et al (2001) Curr. 
Pharm. Des. 7: 231-259). Goodwin states that many previously-identified FXR ligands interact 
with other proteins including bile-acid-binding proteins and transporters (Goodwin at page 518, 
column 1). In order to identify a compound that selectively modulates FXR, the authors screened 
for compounds that modulated FXR activity and then tested these compounds for their ability to 
activate other nuclear receptors that share structural similarity with FXR. Figure 1C of Goodwin 
shows that the compound GW4064 potently activates FXR but does not modulate the activity of 
the other nuclear receptors tested. Note that the nuclear receptor panel screened in Figure 1C 
includes the orphan nuclear receptors SHP-1 and LRH-1 in addition to receptors having 
previously-identified ligands, illustrating that studies often include orphan receptors. 

More than 50% of prescription drugs act at GPCR targets, further showing the importance 
of GPCRs in screens for effective drugs. However, some of these drugs have efficacy problems 
and limiting side-effects because the compounds do not differentiate between receptor subtypes. 
See generally, Stadel et al., (1997) Trends Pharmacol Sci. 18: 430 (Appendix P); Lee and 
Kerlavage (1993) Molecular Biology of G-Protein-Coupled Receptors, 6 DN&P 488 (provided 
as Appendix T). Accordingly, because the GPCR protein family includes a number of key drug 
targets, members of this family share a common use in the selectivity screening of candidate 
drugs. The 2871 receptor shares a high degree of identity with the rhodopsin family of GPCRs 
(see specification Figure 2). This rhodopsin GPCR family includes targets for the treatment of 
numerous disorders including depression, anxiety, migraine, asthma, hypertension, and 
cardiovascular disorders. Thus, all members of this important class of GPCRs, including those 
disclosed in the present invention, have a specific, immediately available, real world utility in the 
selectivity screening of drugs directed at GPCR targets. 

The 2871 receptor shares a high degree of identity with the rhodopsin family of GPCRs 
and is expressed in tissues including those of particular clinical significance to hematological 
disorders, such as hematopoietic cells (see Figure 7; see also Figures 5-6 and specification pages 
6 and 19). Indeed, the 2871 gene is expressed at significant levels in all blood cell progenitors 
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analyzed by the inventors. It is highly expressed in bone marrow (CD34 + ), G-CSF-mobilized 
peripheral blood (containing circulating progenitors derived from bone marrow) and is 
moderately expressed in CD34 + adult bone marrow and CD34 + cord blood cells. It is also highly 
expressed in megakaryocytes as well as CD41 + (CD 14") bone marrow cells. G-CSF-mobilized 
peripheral blood contains circulating progenitors derived from bone marrow. Accordingly, 
expression of the 2871 gene is relevant for treating disorders associated with the formation of 
differentiated and/or mature blood cells, such as anemia, neutropenia, and thrombocytopenia. 

The therapeutic and economic benefits that can result from selectivity screening are well 
known. One example is the events of 1994-1997 leading to Merck's marketing of the painkiller 
Vioxx, described in Gardiner Harris, The Cure: With Big Drugs Dying, Merck Didn 't Merge— It 
Found New Ones, The Wall Street Journal, January 10, 2001, at Al (provided as Appendix U). 
Merck's search for a novel pharmacologically suitable painkiller made use of in vitro screens to 
find drugs that inhibited the activity of Cox-2 but not Cox-1 . Such drugs would inhibit 
prostaglandin production in most of the body but not the gut, thereby ameliorating pain while 
avoiding undesirable side effects. Candidate drugs from a collection of hundreds of synthesized 
drugs were first subjected to in vitro screening; a much smaller number of successful in vitro 
candidates advanced to in vivo screening in mice, and two successful nontoxic drugs from the 
mouse in vivo screens were advanced to even more expensive human clinical trials. Only one of 
these two drugs showed efficacy in clinical trials, ultimately received FDA approval, and is now 
being marketed as Vioxx. This example illustrates how a "real world" benefit can be obtained 
from distinguishing gene family members. 

2. The 2871 sequence has a high degree of identity to other sequences which 
have utility; therefore, the 2871 sequence has utility. 

The USPTO utility examination guidelines state, "[w]hen a class of proteins is defined 
such that the members share a specific, substantial, and credible utility, the reasonable 
assignment of a new protein to the class of sufficiently conserved proteins would impute the 
same specific, substantial, and credible utility to the assigned protein." 66 Fed. Reg. 1096. In 
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the present application, Applicants have demonstrated that the 2871 receptor is a member of the 
rhodopsin family of G-protein coupled receptors. Members of this family of receptors are known 
by those of skill in the art to share a specific, substantial, and credible utility. In fact, 
it has come to our attention that a U.S. patent has issued from an international application 
disclosed by Applicant in the Supplemental IDS returned by the Examiner with paper 8 (the 
Office Action mailed 8/25/00). In U.S. Patent No. 6,063,596, (the '596 patent) with inventors 
Lai et al. and assigned to Incyte Pharmaceuticals, issued 16 May 2000, one of the disclosed 
sequences has 98% identity to Applicant's 2871 sequence. The claimed invention of the '596 
patent is described as providing human G-protein coupled receptors associated with immune 
response. Applicants' present claims are directed to methods of using the 2871 sequence of the 
present invention. Because there is an issued U.S. patent with claims to sequences with a high 
degree of identity to Applicant's 2871 sequences, the Patent Office must have found these 
sequences to have utility. Accordingly, a rejection of Applicants' present claims for lack of 
utility is inappropriate and should be withdrawn. 

3. The present invention is useful in its currently available form. 

The Examiner has stated that the specification does not provide "any evidence or 
guidance suggesting the claimed protein's activity" (Examiner's Answer at page 3) and that 
therefore doubt is cast on "whether the nucleotide sequence or its encoded protein can be used in 
any of applicants asserted utilities." (emphasis added; Examiner's Answer at page 4). 
Applicants disagree. As discussed in the specification £nd known in the art, GPCRs (G-protein 
coupled receptors) are responsible for G-protein mediated signal transduction. "GPCRs, along 
with G proteins and. . .intracellular enzymes and channels modulated by G-proteins, are the 
components of a modular signaling system that connects the state of intracellular second 
messengers to extracellular inputs." (specification page 2; see also pp. 6, 7 3 20). 

While the Examiner's assertion of lack of utility may reflect the thinking of the pre- 
genomics era, it does not accurately describe the current state of the art in drug discovery. Those 
of skill in the art appreciate that rapid advances in technology have led to dramatic changes in the 
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way research is conducted in many biomedical-related areas. "Molecular biology has had a 
dramatic influence" on active drug discovery and research projects in the pharmaceutical 
industry, particularly those involving GPCRs. See Stadel et al (1997) Trends Pharmacol Sci. 
18:430-436; provided as Appendix P). The advances in molecular biology have led to what 
those in the art consider a "paradigm shift" in the way research and drug discovery is conducted. 
Id. In this new paradigm, the starting point in the process is the identification of new members of 
gene families such as the GPCR superfamily by "computational or bioinformatic 
methodologies." Stadel at 430. "Once new members of the GPCR superfamily are identified, 
the recombinantly expressed receptors are used in functional assays to search for the associated 
novel ligands. The receptor-ligand pair are then used for compound bank screening to identify a 
lead compound that, together with the activating ligand, is used for biological and 
pathophysiological studies to determine the function and potential therapeutic value of a receptor 
antagonist (or agonist) in ameliorating a disease process." Stadel at 434; see also Fraser (1995), 
J. Nucl. Med. 36 (6 Suppl): 17S-21S (Appendix R). Often, these screens are implemented in 
high-throughput format. See id. Thus, in the molecular biology field of the present invention, 
the discovery of a novel sequence is the key step, or "first link" of Cross. See, Cross v. Iizuka, 
753 F.2d 1040, 1051 (Fed. Cir. 1985) (holding that "[w]e perceive no insurmountable difficulty, * 
under appropriate circumstances, in finding that the first link in the screening chain, in vitro 
testing, may establish a practical utility for the compound in question.") 

Similarly, in drug development, the key step or "first link" is the discovery of a novel 
sequence such as that of the present invention; subsequent screening steps are routinely 
performed. As those in the art note, "the potential reward of using this ["reverse molecular 
pharmacological strategy"] approach is that resultant drugs naturally will be pioneer or 
innovative discoveries, and a significant proportion of these unique drugs may be useful to treat 
diseases for which existing therapies are lacking or insufficient." Stadel at 434. 

As those in the art are aware, much is known about GPCRs but many details of GPCR 
activity remain to be resolved, including comprehensive information about the mechanisms and 
domains of previously discovered GPCRs. Despite this lack of encyclopedic knowledge about 
GPCRs, members of this gene family have been shown to bind a variety of ligands and have been 
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successfully used for drug discovery. See, for example, Stadel et al., (1997) Trends Pharmacol. 
Sci. 18:430. "Because of the proven link of GPCRs to a wide variety of diseases and the 
historical success of drugs that target GPCRs, we believe that these orphan receptors are among 
the best targets of the genomic era to advance into the drug discovery process." Stadel at 436. 
"The fact that GPCRs mediate a broad spectrum of cellular events make these proteins an ideal 
target for drug interaction and therapeutics." Lee and Kerlavage (1993) Molecular Biology of G- 
Protein-Coupled Receptors, 6 DN&P 488 (Appendix T). 

4. The rejection of the claims under 35 U.S. C §101 and §112, first 
paragraph, is inconsistent with USPTO guidelines and supporting case law. 

The Utility Examination Guidelines state, "Applicants] need only provide one credible 
assertion of specific and substantial utility for each claimed invention to satisfy the utility 
requirement." 66 Fed. Reg. 1098. This one-utility requirement is consistent with Cross, which 
held that "[w]hen a properly claimed invention meets at least one stated objective, utility under 
§101 is clearly shown" Cross, 753 F.2d at 1046 fh9, citing Raytheon Co. v. Roper Corp. 724 
F.2d 951, 958 (Fed. Cir. 1983), cert, denied, 469 U.S. 835 (1984). Thus, the Examiner's utility 
rejection depends on the invalidity of each of Applicants' asserted uses. However, as the 
Examiner noted (at page 3 of the Office Action mailed February 12, 2001 (paper 1 1)), 
"applicants do indeed provide multiple well-established and specific utilities for a GPCR." 
Inexplicably, the Examiner now states (at page 5 of the Examiner's Answer mailed August 28, 
2001 (paper 18)) that "since there was no specific and substantial asserted utility or a well- 
established utility for the claimed nucleic acids and encoded proteins, credibility of the utility 
was not assessed." 

The PTO guidelines state, "[a] rejection based on lack of utility should not be maintained 
if an asserted utility for the claimed invention would be considered specific, substantial, and 
credible by a person of ordinary skill in the art in view of all evidence of record." 66 Fed. Reg. 
1098). "Credibility is assessed from the perspective of one of ordinary skill in the art in view of 
the disclosure. ..." 66 Fed. Reg. 1098. As the Examiner noted (at page 3 of the Office Action 
mailed Feb. 12, 2001 (paper 1 1)), Applicants "do indeed provide multiple well-established and 
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specific utilities for a GPCR," and one of ordinary skill in the art would agree with the Examiner 

that the present invention satisfies the utility standard. 

The PTO utility examination guidelines also state, 

[w]here the asserted utility is not specific or substantial, a prima facie 
showing [of no specific and substantial credible utility] must establish that 
it is more likely than not that a person of ordinary skill in the art would not 
consider that any utility asserted by the Applicants would be specific and 
substantial. The prima facie showing must contain the following 
elements: (1) An explanation that clearly sets forth the reasoning used in 
concluding that the asserted utility for the claimed is not both specific and 
substantial nor well-established; (2) Support for factual findings relied 
upon in reaching this conclusion; and (3) An evaluation of all relevant 
evidence of record, including utilities taught in the closest prior art. 

(66 Fed. Reg. 1098). Further, "[o]ffice personnel are reminded that they must treat as true a 
statement of fact made by Applicants in relation to an asserted utility, unless countervailing 
evidence can be provided that shows that one of ordinary skill in the art would have a legitimate 
basis to doubt the credibility of such a statement" (66 Fed. Reg. 1098-99). 

This provision is consistent with the case law. See, In re Gazave, 379 F.2d 973 (C.C.P.A. 
1967) (finding that the utility standard was met where "appellant's assertions of usefulness in his 
specification appear to be believable on their face and straightforward, at least in the absence of 
reason or authority in variance"); Ex parte Dash, 27 U.S.P.Q.2d 1481, 1484 (Bd. Pat. App. & 
Int'f 1993) (holding that "[a] disclosure of a utility satisfies the utility requirement of section 101 
unless there are reasons for the artisan to question the truth of such disclosure.") Similarly, in In 
re Jolles, claims to pharmaceutical compounds and methods of use were rejected under §101 and 
§112. The court held, "it is proper for the examiner to ask for substantiating evidence unless one 
with ordinary skill in the art would accept the allegations as obviously correct" (628 F.2d 1322, 
1327 (C.C.P.A. 1980)). See also, In reBrana, 51 F.3d 1560, 1563 (Fed. Cir. 1995) (stating that 
"[o]nly after the PTO provides evidence showing that one of ordinary skill in the art would 
reasonably doubt the asserted utility does the burden shift to the Applicants to provide rebuttal 
evidence sufficient to convince such a person of the invention's asserted utility," and holding that 
the PTO did not meet this burden.) 
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In the present case, the utility rejection has not been supported in the required manner. 
As discussed above, the Examiner's objections are not properly grounded in the authority cited 
and are in fact inconsistent with practices in the art. Accordingly, the Examiner has not made a 
prima facie showing of no utility and the rejection should be withdrawn. 
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CONCLUSION 



In view of the arguments presented above, Applicants contend that each of claims 73, 74, 
81, and 88-96 is patentable. Therefore, reversal of the rejections under 35 U.S.C. § 101 and 35 
U.S.C. § 112, first paragraph, is respectfully solicited. 

It is not believed that extensions of time or fees for net addition of claims are required, 
beyond those, which may otherwise be provided for in documents accompanying this paper, 
owever, in the event that additional extensions of time are necessary to allow consideration of 
[is paper, such extensions are hereby petitioned under 37 CFR § 1.136(a), and any fee required 
therefore (including fees for net addition of claims) is hereby authorized to be charged to Deposit 
Account No. 16-0605. 
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APPEALED CLAIMS 

73. A method for detecting the presence of a polypeptide having an amino acid 
sequence selected from the group consisting of: 

(a) the amino acid sequence shown in SEQ ID NO:l; and 

(b) the amino acid sequence encoded by the cDNA contained in ATCC 
Deposit No. PTA-2369; 

said method comprising contacting the sample with a compound which selectively 
binds to any one of the polypeptides of (a) - (b) and determining whether the compound binds to 
said polypeptides in the sample. 

74. The method of claim 73, wherein the compound which binds to the polypeptide is 
an antibody. 

81 . A method for modulating the activity of a polypeptide having an amino acid 
sequence selected from the group consisting of: 

(a) the amino acid sequence shown in SEQ ED NO: 1 ; and 

(b) the amino acid sequence encoded by the cDNA contained in ATCC 
Deposit No. PTA-2369; 

said method comprising contacting any one of polypeptides (a) - (b) or a cell 
expressing any one of polypeptides (a) - (b) with a compound which binds to the polypeptide in a 
sufficient concentration to modulate the activity of the polypeptides. 

88. A method for screening a cell to identify an agent that binds with a polypeptide 
having an amino acid sequence shown in SEQ ED NO:l in said cell, said method comprising 
contacting said cell with an agent and detecting an interaction between said polypeptide and 
agent. 
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89. A method for screening a cell to identify an agent that modulates the expression 
level or activity of the polypeptide having an amino acid sequence shown in SEQ ID NO:l in 
said cell, said method comprising contacting said cell with an agent and detecting an interaction 
between said polypeptide and agent. 



90. The method of claim 89, wherein said cell is a blood cell. 



91. The method of claim 90, wherein said blood cell is a myeloid progenitor cell. 

92. The method of claim 91, wherein said myeloid progenitor cell is a CD34 + cell. 

93. The method of claim 89, wherein said agent increases the level or activity of said 
polypeptide. 

94. The method of claim 89, wherein said agent decreases the level or activity of said 
polypeptide. 

95. A method for assessing G-protein receptor expression in disease states of a 
patient, comprising contacting a tissue of said patient with an isolated antibody that selectively 
binds to the polypeptide shown in SEQ ID NO:l. 

96. The method of claim 95, wherein the G-protein coupled receptor expression is 
involved in signal transduction. 
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OLFJ HUMAN P30954 OLFACTORY RECEPTOR-LIKE PROTEIN HGMP07J. 



OL15 MOUSE P23275 OLFACTORY RECEPTOR 15 (OR3). 



OLF6 RAT P23267 OLFACTORY RECEPTOR-LIKE PROTEIN F6. 



OLF1 CHICK P37067 OLFACTORY RECEPTOR-LIKE PROTEIN COR1. 



GU27 RAT P34987 GUSTATORY RECEPTOR GUST27. 



RTA RAT P23749 PROBABLE G PROTEIN-COUPLED RECEPTOR RTA. 



TA2RJHUMAN P21731 THROMBOXANE A2 RECEPTOR (TXA2-R) (PROSTANOID TP 
RECEPTOR). 



PE24_HUMAN P35408 PROSTAGLANDIN E2 RECEPTOR, EP4 SUBTYPE (PROSTANOID EP4 
RECEPTOR) (PGERECEPTOR, EP4 SUBTYPE). 



UL33 HCMVA P16849 G-PROTEIN COUPLED RECEPTOR HOMOLOG UL33. 



OPSB HUMAN P03999 BLUE-SENSITIVE OPSIN (BLUE CONE PHOTORECEPTOR 



http://pfam.wustl.edu/cgi-bin/getdomainview 



5/2/2001 



Pfam: domain structure of prot<* ; "«: in the 7tm_l Seed alignment 



Page 2 of 6 



PIGMENT) 



OPS3 DROME P04950 OPSIN RH3 (INNER R7 PHOTORECEPTOR CELLS OPSIN). 



OPSD LOLFO P24603 RHODOPSIN. 



OPS1 DROME P06002 OPSIN RHI (OUTER R1-R6 PHOTORECEPTOR CELLS OPSIN). 



V2R_HUMAN P30518 VASOPRESSIN V2 RECEPTOR (RENAL-TYPE ARGININE 
VASOPRESSIN RECEPTOR)(ANTIDIURETIC HORMONE RECEPTOR) (AVPR V2). 



FSHR_BOVIN P35376 FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR 
(FSH-R) (FOLLITROPINRECEPTOR). 



Sea c±e3 



TRFR_HUMAN P34981 THYROTROPIN-RELEASING HORMONE RECEPTOR (TRH-R) 
(THYROLIBERIN RECEPTOR). 



NTR1_HUMAN P30989 NEUROTENSIN RECEPTOR TYPE 1 (NT-R-1) (HIGH-AFFINITY 
LEVOCABASTINE-INSENSITrVE NEUROTENSIN RECEPTOR) (NTRH). 



NY1R HUMAN P25929 NEUROPEPTIDE Y RECEPTOR TYPE 1 (NPY1-R). 



NYR DROME P25931 NEUROPEPTIDE Y RECEPTOR (NPY-R) (PR4 RECEPTOR). 



TLRI DROME P30974 TACHYKININ-LIKE PEPTIDES RECEPTOR 86C (NKD). 



NK1R CAVPO P30547 SUBSTANCE-P RECEPTOR (SPR) (NK-1 RECEPTOR) (NK-1R). 



http://pfam.wustl.edu/cgi-bin/getdomainview 



5/2/2001 



Pfam: domain structure of prot^-* in the 7tm_l Seed alignment 



Page 3 of 6 



GCRC_MOUSE P30731 PROBABLE G PROTEIN-COUPLED RECEPTOR FROM T-CELLS 
PRECURSOR(GLUCOCORTICOID-INDUCED RECEPTOR). 



CCKR_HUMAN P32238 CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) 
(CCK-AR). 



BRS3 CAVPO P35371 BOMBESIN RECEPTOR SUBTYPE-3 (BRS-3). 



PAFR CAVPO P21556 PLATELET ACTIVATING FACTOR RECEPTOR (PAF-R). 



THRR CRILO Q00991 THROMBIN RECEPTOR PRECURSOR. 



P2Y5 CHICK P32250 P2Y PURINOCEPTOR 5 (P2Y5) (PURINERGIC RECEPTOR 5) (6H1). 



EBI2 HUMAN P32249 EBV-INDUCED G PROTEIN-COUPLED RECEPTOR 2 (EBI2). 



US28 HCMVA P09704 G-PROTEIN COUPLED RECEPTOR HOMOLOG US28 (HHRF3). 



US27 HCMVA P09703 G-PROTEIN COUPLED RECEPTOR HOMOLOG US27 (HHRF2). 



C5AR CANFA P30992 C5A ANAPHYLATOXIN CHEMOTACTIC RECEPTOR (C5A-R). 



RDCl CANFA PI 1613 G PROTEIN-COUPLED RECEPTOR RDC1. 
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G10D RAT P31392 PROBABLE G PROTEIN-COUPLED RECEPTOR G10D (NOW). 



SSR1 HUMAN P30872 SOMATOSTATIN RECEPTOR TYPE 1 (SS1R) (SRIF-2). 



OPRD MOUSE P32300 DELTA-TYPE OPIOID RECEPTOR (DOR-1) (K56) (MSL-2). 



APJ HUMAN P35414 PROBABLE G PROTEIN-COUPLED RECEPTOR APJ. 



GUSB BOVIN P35350 POSSIBLE GUSTATORY RECEPTOR TYPE B (PPR1 PROTEIN). 



CKR7_HUMAN P32248 C-C CHEMOKINE RECEPTOR TYPE 7 PRECURSOR (C-C CKR-7) 
(CC-CKR-7) (CCR-7)(MIP-3 BETA RECEPTOR) (EBV-INDUCED G PROTEIN-COUPL 



C3X1_RAT P3541 1 CX3C CHEMOKINE RECEPTOR 1 (C-X3-C CKR-1) (CX3CR1) 
(FRACTALKINERECEPTOR) (GPR13) (RBS1 1). 



CKR1_HUMAN P32246 C-C CHEMOKINE RECEPTOR TYPE 1 (C-C CKR-1) (CC-CKR-1) 
(CCR-1) (CCRl)(MACROPHAGE INFLAMMATORY PROTEIN- 1 ALPHA RECEPTOR) (M 



CCR4_BOVIN P25930 C-X-C CHEMOKINE RECEPTOR TYPE 4 (CXC-R4) (CXCR-4) (SDF-1 
RECEPTOR)(STROMAL CELL-DERIVED FACTOR 1 RECEPTOR) (FUSIN) (LEUK 



IL8A_HUMAN P25024 HIGH AFFINITY DMTERLEUKIN-8 RECEPTOR A (IL-8R A) (IL-8 
RECEPTOR TYPE1) (CXCR-1) (CDW128). 



CCR5_HUMAN P32302 C-X-C CHEMOKINE RECEPTOR TYPE 5 (CXC-R5) (CXCR-5) 
(BURKITTS LYMPHOMARECEPTOR 1) (MONOCYTE-DERIVED RECEPTOR 15) (MDR15). 
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BRB2 HUMAN P3041 1 B2 BRADYKININ RECEPTOR (BK-2 RECEPTOR). 



AG2R BOVIN P25104 TYPE-1 ANGIOTENSIN II RECEPTOR (ATI). 



AG22 MOUSE P35374 TYPE-2 ANGIOTENSIN II RECEPTOR (AT2). 



MC3R MOUSE P33033 MELANOCORTIN-3 RECEPTOR (MC3-R). 



EDG1 HUMAN P21453 PROBABLE G PROTEIN-COUPLED RECEPTOR EDG-1. 



CB2R HUMAN P34972 C ANN AB INOID RECEPTOR 2 (CB2) (CB-2) (CX5). 



CB1R HUMAN P21554 CANNABINOID RECEPTOR 1 (CB1) (CB-R) (CANN6). 



ACM1 HUMAN PI 1229 MUSCARINIC ACETYLCHOLINE RECEPTOR Ml . 



AA1R BOVIN P28190 ADENOSINE A 1 RECEPTOR. 



5H2A_CRIGR PI 8599 5-HYDROXYTRYPTAMINE 2A RECEPTOR (5-HT-2A) (SEROTONIN 
RECEPTOR)(5-HT-2). 



5H5A_MOUSE P30966 5-HYDROXYTRYPTAMINE 5A RECEPTOR (5-HT-5A) (SEROTONIN 
RECEPTOR)(5-HT-5). 



5H6_RAT P31388 5-HYDROXYTRYPTAMINE 6 RECEPTOR (5-HT-6) (SEROTONIN 
RECEPTOR)(ST-B17). 
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HH2R CANFA PI 7 124 HISTAMINE H2 RECEPTOR (GASTRIC RECEPTOR T). 



D2DR BOVIN P20288 D(2) DOPAMINE RECEPTOR. 



A1AD_HUMAN P25100 ALPHA- ID ADRENERGIC RECEPTOR (ALPHA 1D- 
ADRENOCEPTOR) (ALPHA- 1 AADRENERGIC RECEPTOR). 



D ADR HUMAN P21728 D(IA) DOPAMINE RECEPTOR. 



B1AR HUMAN P08588 BETA- 1 ADRENERGIC RECEPTOR. 



5HT1_DROMEP20905 5-HYDROXYTRYPTAMINE RECEPTOR 1 (5-HT RECEPTOR) 
(SEROTONIN RECEPTOR). 



5H7 HUMAN P34969 5-HYDROXYTRYPTAMINE 7 RECEPTOR (5-HT-7) (5-HT-X) 
(SEROTONIN RECEPTOR)(5HT7). 



5H1BHUMAN P28222 5-HYDROXYTRYPTAMINE IB RECEPTOR (5-HT- IB) (SEROTONIN 
RECEPTOR)(5-HT- 1 D-BETA) (SI 2). 



5H1A HUMAN P08908 5-HYDROXYTRYPTAMINE 1 A RECEPTOR (5-HT- 1 A) (SEROTONIN 
RECEPTOR) (5-HT1 A) (G-21). 
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Abstract 



In the wake of the numerous now-fruitful genome projects, we have witnessed a 'tsunami' of sequence data and 
with it the birth of the field of bioinformatics. Bioinformatics involves the application of information technology to 
the management and analysis of biological data. For many of us, this means that databases and their search tools 
have become an essential part of the research environment. However, the rate of sequence generation and the 
haphazard proliferation of databases have made it difficult to keep pace with developments, even for the 
cognoscenti. Moreover, increasing amounts of sequence information do not necessarily equate with an increase in 
knowledge, and in the panic to automate the route from raw data to biological insight, we may be generating and 
propagating innumerable errors in our precious databases. In the genome era upon us, researchers want rapid, easy- 
to-use, reliable tools for functional characterisation of newly determined sequences. For the pharmaceutical industry 
in particular, the Pandora's box of bioinformatics harbours an information-rich nugget, ripe with potential drug 
targets and possible new avenues for the development of therapeutic agents. This review outlines the current status 
of the major pattern databases now used routinely in the analysis of protein sequences. The review is divided into 
three main sections. In the first, commonly used terms are defined and the methods behind the databases are briefly 
described; in the second, the structure and content of the principal pattern databases are discussed; and in the final 
part, several alignment databases, which are frequently confused with pattern databases, are mentioned.. For the 
new-comer, the array of resources, the range of methods behind them and the different tools required to search 
them can be confusing: The review therefore also briefly mentions a current international endeavour to integrate the 
diverse databases, which effort should facilitate sequence analysis in the future. This is particularly important for 
target-discovery programmes, where the challenge is to rationalise the enormous numbers of potential targets 
generated by sequence database searches. This problem may be addressed, at least in part, by reducing search 
outputs to the more focused and manageable subsets suggested by searches of integrated groups of family-specific 
pattern databases. © 2000 Elsevier Science Ltd. All rights reserved. 

Keywords: Bioinformatics; Similarity search; Sequence alignment; Pattern recognition; Function annotation 
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1. Introduction 

Ten years from the dawn of the field of bio- 
informatics, we are harvesting the abundant 
fruits of a variety of genome projects and, in 
spite of early flood warnings, the resultant tor- 
rent of sequence information has all but broken 
our databanks. Biological databases are now a 
central part of the research environment, but 
many have evolved simply as a by-product of a 
particular individual's research project, with no 
thought that they might one day become valuable 
international treasures. Consequently, some have 
not stood the test of time (most do not survive 
beyond the first five years [1]). Others are creak- 
ing under the strain of information overload, 
their underlying technologies never having been 
designed to cope with such volumes of data. Still 
others have managed to survive via collaborative 
efforts, some on an international scale. The pro- 
tein sequence database (PSD), for example, 
evolved in the early 1960s from Margaret 
DayhofT's research on the evolutionary relation- 
ships among proteins [2]. By 1980, the collection 
had grown to (a mere) 200 sequences [3], which 
in the last two decades has increased more than 
600 fold to -131,000 (release 61, June 1999). The 
PSD is now maintained collaboratively by PIR- 
International [4] and is one of the most compre- 
hensive protein sequence collections currently 
available. 

Today, there are hundreds of databanks 
around the world housing information at the 
levels of the genome, the proteome and even the 
me'tabolome [5]. The endeavour to cope with and 
rationalise these vast quantities of data has 
required global co-operation and ever increasing 
levels of automation in data handling and analy- 
sis. However, automation carries a price. In the 
field of genomics, for example, although software 
robots are essential to the process of functional 
annotation of newly determined sequences, they 
pose a threat to information quality because they 
can introduce and propagate mis-annotations [6]. 
The curators are aware of this and always strive 
to improve the quality of their resources, but 
databases are nevertheless historical products (or, 



in some cases, historical accidents!) and are there- 
fore far from perfect. To get the most from cur- 
rent biological databases it is thus important to 
have an understanding both of their powers and 
of their pitfalls. 

The first step towards functional characteris- 
ation of a new sequence usually involves trawling 
a sequence database with tools such as BLAST 
[7] or FASTA [8]. Such searches quickly reveal 
similarities between the query and a range of 
database sequences. The trick then lies in the re- 
liable inference of homology (the verification of a 
divergent evolutionary relationship) and, from 
this, the inference of function. Ideally, a search 
output will show unequivocal similarity to a well- 
characterised protein over the full length of the 
query. At worst, an output will reveal no signifi- 
cant hits, but the usual scenario is a list of partial 
matches to diverse proteins, many of them 
uncharacterised and some with dubious or con- 
tradictory annotations [9]. 

There are various reasons for this sort of con- 
fusion. For example, the increasing size of 
sequence databases and their population by 
greater numbers of poorer quality partial 
sequences (such as expressed sequence tags), 
gives rise to an increasing likelihood that high- 
scoring matches will be made to a query simply 
by chance. So-called low-complexity matches, in 
particular, may swamp search outputs — these 
are regions within a sequence that have high den- 
sities of particular residues (e.g. poly-GxP, such 
as occurs in repetitive, often tightly structured 
sequences like collagen; or polyglutamine tracts 
that occur in Huntingdon's disease protein; and 
so on). Although mechanisms are available for 
masking such sequences, their incautious use may 
also create complications. The modular and/or 
domain nature of many proteins also causes pro- 
blems on different levels. First, when matching 
multidomain proteins, it may not be clear which 
domain or domains correctly correspond to the 
query. Second, even if the right domain has been 
identified, it may not be appropriate to transfer 
the functional annotation to the query because 
the function of the matched domain may be 
different, depending on its precise biological con- 
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R-W-x(2)-[AG]-C-x-[NQ] 
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Fig. 1. At the heart of sequence analysis methods is the multiple sequence alignment. Application of these methods involves the 
derivation of some kind of representation of conserved features of the alignment, which may be diagnostic of structure or function. 
Various terms are used to describe the different types of data representation, as shown. Within a single conserved region (motif), 
the sequence information may be reduced to a single consensus expression (a regular expression), often simply referred to as a pat- 
tern. In this example, square brackets indicate residues that are allowed at this position of the motif and x denotes any residue, the 
(2) indicating that any residue can occupy consecutive positions in the motif The term used to describe groups of motifs in which 
all the residue informauon is retained within a set of frequency (identity) matrices is a fingerprint. Adding a scoring scheme to such 
sets of frequency matrices results in position-specific weight matrices, or blocks. Using information from extended conserved 
regions that include gaps (usually referred to as domains) gives rise to profiles; and probabilistic models derived from alignment 
profiles are termed hidden Markov models. 



text. Similar issues arise with the existence of 
multigene families, because database search tech- 
niques cannot differentiate between a matched 
orthologue (the functional counterpart of a 
sequence in another species) and a matched para- 
logue (a homologue that performs different but 
related functions within the same organism). 

Achieving consistent, reliable functional assign- 
ments can be a complicated process. As a result, 
in addition to routine searches of the sequence 
databases, it is now customary to extend search 



strategies to include a range of 'value-added' or 
pattern databases. These distil information within 
groups of related sequences into potent descrip- 
tors or discriminators that aid family diagnosis. 
Searching pattern databases is more sensitive and 
selective than sequence database searching 
because derived family discriminators can detect 
weaker regions of similarity. Different analytical 
approaches have been used to create a bewilder- 
ing array of discriminators, which are variously 
termed regular expressions, rules, profiles, signa- 
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Table 1 

Web addresses of pattern and alignment databases in com- 
mon use; for a more exhaustive list, refer to the annual data- 
base issue of Nucleic Acids Research (http://www3.oup.co.uk/ 
nar/) 



PROSITE 


http://www.expasy.ch/prosite/ 


BLOCKS 


h Up ://www . blocks . fhcrc. org,/ 


PRINTS 


http://www.bioinf.man.ac.uk/dbbrowser/ 




PRINTS/ 




IDENTIFY 


http:// 


dna.Stanford.EDU/identify/ 


Profiles 


http://www.isrec.isb-sib.ch/software/ 




PFSCAN form.html 


Pfam 


http://www.sanger.ac.uk/Software/Pfam/ 


ProDom 


http://www.toulouse.inra.fr/prodom.html 


SBASE 


http://www.icgeb.trieste.it/sbase/ 


PIR-ALN 


http://www-nbrf.georgetown.edu/pirwww/search/ 




textpiraln.htnil 


PROT- 


http://vms.mips.biochem.mpg.de/mips/programs/ 


FAM 


classiftcation.html 


DOMO 


http://www.infobiogen.fr/ ~ gracy/domo/ 


ProClass 


http://pir.georgetown.edu/gfserver/proclass.htmI 


ProtoMap 


http://www.protomap.cs.huji.ac.il/ 


PIMA 


http://dot.imgen.bcm. tmc.edu :933 1 /seq-search/ 




protein-search.html 


ProWeb 


http://www.proweb.org/kinesin/ProWeb.htrnl 



tures, fingerprints, blocks, etc. [10] — these terms 
are summarised in Fig. 1. The different descrip- 
tors have different diagnostic strengths and weak- 
nesses and different areas of optimum application 
and have been used to generate different pattern 
databases, which also tend to differ in content! 
The aim of this review is to provide an overview 
of the current status of pattern and alignment 
databases in common use and to provide pointers 
on how best to use them. As this is a rapidly 
developing area, a list of Web addresses is given 
in Table 1 to allow readers to obtain the most 
up-to-date information on the resources dis- 
cussed. 



2. The methods behind the databases 

At the heart of the analysis methods that 
underpin pattern databases is the multiple 
sequence alignment. When building an alignment, 
as more distantly related sequences are included, 
insertions are often required to bring equivalent 



parts of adjacent sequences into the correct regis- 
ter, as illustrated schematically in Fig. 1. As a 
result of this gap insertion process, islands of 
conservation emerge from a backdrop of muta- 
tional change. These conserved regions (typically 
around 10-20 amino acids in length) tend to cor- 
respond to the core structural or functional el- 
ements of the protein; they are most commonly 
termed motifs, but are also referred to as blocks, 
segments or features. 

Several techniques have evolved to exploit the 
conservation encoded in sequence alignments, as 
shown in Fig. 2 [11]. Broadly, the methods fall 
into three categories, depending on whether they 
use single motifs, multiple motifs or full domain 
alignments. Whatever the approach, all involve 
the derivation of some kind of discriminatory 
representation of the conserved alignment el- 
ements — essentially, the conserved motifs pro- 
vide a characteristic signature or fingerprint for 
the family, which can be used to facilitate diag- 
nosis of future query sequences. 

The diagnostic success of the different methods 
depends on how reliably true family members 
(true positives) can be distinguished from non- 
family members (true negatives). In practice, 
there is a crucial balance between the number of 
incorrect matches that are made (false positives) 
and the number of correct matches that are 
missed (false negatives) at a given scoring 
threshold. As shown in Fig. 3, for a given search, 
the distribution of true positive matches must be 
resolved from that of the true negatives, such 
that the overlap between them is minimised or 
eliminated. This is important because, for 
matches in the overlapping area, it can be diffi- 
cult or impossible to determine which are correct 
(statistical approaches are used to assign confi- 
dence levels to matches in this area, but math- 
ematical significance does not give biological 
proof). The different analytical methods that 
have been designed to improve the resolving 
power of database searches are outlined below. 

2. 1 . Single-motif methods 

Of the various approaches, single-motif (regu- 
lar expression pattern) methods are easiest to 
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Fig. 2. Illustration of the three principal methods for building pattern databases: i.e. using single motifs, multiple motifs and full 
domain alignments. Single-motif (regular expression pattern) approaches have given rise to the PROSITE and IDENTIFY data- 
bases; multiple-motif methods have spawned the BLOCKS and PRINTS databases; and domain alignment methods have resulted 
in the Profiles and Pfam resources. 



understand. The idea is that a particular protein 
family can be characterised by the single most 
conserved, often functionally important, region 
(e.g. an enzyme active site) observed in a 
sequence alignment. The motif is reduced to a 
consensus expression in which all but the most 



significant residue information is discarded. For 
example, the short expression D-[ALV]-x-{YW}- 
T means that a conserved aspartic acid (D) resi- 
due is followed by a hydrophobic residue, which 
may be alanine (A), leucine (L) or valine (V); this 
is followed by an arbitrary residue (x) and any 
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Number 
of matches 




Threshold 

Fig. 3. Resolving true and false matches. In a database search, 
the desire is to establish which sequences are related to the 
query (i.e. are true positive) and which are unrelated (true 
negative). At a given scoring threshold, it is likely that several 
unrelated sequences will match erroneously (so-called false 
positives) and several correct matches will fail to be diagnosed 
(false negatives). In sequence analysis, the challenge is to 
improve diagnostic performance by capturing all (or the ma- 
jority) of true positive family members, including no (or few) 
false positives and minimising or precluding false negatives. 

residue except tyrosine (Y) or tryptophan (W); 
and finally a conserved threonine (T). No other 
residues or residue combinations are tolerated by 
the expression; matches to it must therefore be 
exact, or will be disregarded. 

So rigid is this syntax that regular expression 
patterns do not perform well when used to rep- 
resent highly divergent protein families. For 
example, these patterns will fail to match signifi- 
cant sequences if they contain a single amino 
acid difference — hence, the sequence DAMYT 
is a mis-match, in spite of matching the above ex- 
pression in all but one position (it has a forbid- 
den tyrosine as its fourth residue). Conversely, a 
pattern will match anything that corresponds to 
it exactly, regardless of whether it is a true family 
member. The problem is that matches to single 
motifs lack biological context — a match to a 
pattern is just a match to a pattern and may well 
only be fortuitous. To assess the likelihood of a 
match being 'real', it must be verified with corro- 
borating evidence, whether via other database 
searches, the literature, experiment, etc. 



Table 2 

Overlapping sets of amino acids and their properties; these 
are used to create the permissive regular expressions used as 
the basis of the IDENTIFY resource 



Residue property 


Residue groups 


Small 


Ala, Gly 


Small hydroxyl 


Ser, Thr 


Basic 


Lys, Arg 


Aromatic 


Phe, Tyr, Trp 


Basic 


His, Lys, Arg 


Small hydrophobic 


Val, Leu, He 


Medium hydrophobic 


Val, Leu, He, Met 


Acidic/amide 


Asp, Glu, Asn, Gin 


Small/polar 


Ala, Gly, Ser, Thr, Pro 



An approach that addresses the strict nature of 
exact regular expression matching is to assign 
amino acid residues to distinct, but overlapping, 
substitution groups corresponding to various bio- 
chemical properties (e.g. charge and size), as 
shown in Table 2. This is a biologically sensible 
approach because each amino acid has several 
properties and can serve different functions, 
depending on its biochemical context [12]. 
However, although the technique is more flexible, 
its inherent permissiveness brings with it an inevi- 
table signal-to-noise trade-off — i.e. resulting 
patterns not only have the potential to make 
more true positive matches, but they will conse- 
quently also match more false positives. For 
example, the sequence DAMPS, which would be 
excluded by the exact regular expression above, 
would be matched by the permissive one (because 
Ser and Thr belong to the same group), even if 
threonine were biologically mandatory at the last 
position of the motif. 

2.2. Multiple-motif methods 

In response to the problems inherent in single- 
motif methods, diagnostic techniques sub- 
sequently evolved to exploit multiple motifs. 
Within a sequence alignment, it is usual to find 
not one, but several motifs that characterise the 
aligned family. Diagnostically, it makes sense to 
use many or all such conserved regions to build a 
family signature or fingerprint. In a database 
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search, there is then a greater chance of identify- 
ing a distant relative, whether or not all parts of 
the signature are matched. For example, a 
sequence that matches only four of seven motifs 
may still be diagnosed as a true match if the 
motifs are matched in the correct order in the 
sequence and the distances between them are 
consistent with those expected of true neighbour- 
ing motifs. The ability to tolerate mis-matches, 
both at the level of individual residues within 
motifs and at the level of motifs within the com- 
plete signature, renders multiple-motif matching 
a powerful diagnostic approach. 

Different multiple-motif methods have arisen, 
depending both on the technique used to detect 
the motifs and on the scoring method employed. 
Probably the simplest to understand is the tech- 
nique of fingerprinting [13]. Here, groups of con- 
served motifs are excised from a sequence 
alignment and used to create a series of fre- 
quency (identity) matrices — no mutation or 
other similarity data are used to weight the 
results. The scoring scheme is thus based on the 
calculation of residue frequencies for each pos- 
ition in the motifs, summing the scores of identi- 
cal residues for each position of a retrieved 
match. However, the main strength of this 
approach also gives rise to its main weakness. In 
other words, because the method exploits 
observed residue frequencies, the scoring matrices 
are sparse and thus perform cleanly (with little 
noise) and with high specificity; at the same time, 
their absolute scoring potential is limited by the 
nature of the observed data. For richly populated 
families, this is not a problem, because the result- 
ing matrices will reflect the constituent sequence 
diversity; but for poorly populated families, the 
matrices may be too sparse and may not encode 
sufficient variation to be able to detect distant 
relatives reliably, if at all. 

One way to address this problem is to use mu- 
tation or substitution matrices to weight noniden- 
tical residue matches. Commonly used scoring 
matrices include the PAM [14] and BLOSUM 
series [15]. The former is based on the concept of 
the point accepted mutation (PAM). PAM 250 is 
often used as a default matrix in comparison pro- 
grams because it gives similarity scores equivalent 



to 20% matches remaining between two 
sequences, the twilight zone [16] of similarity. 
The BLOSUM matrices, which are derived from 
observed substitutions in blocks of aligned 
sequences from the BLOCKS database, were 
designed to detect distant similarities more re- 
liably than the Dayhoff series, which can only 
infer remote relationships because their substi- 
tution rates were derived from sets of highly simi- 
lar sequences. Whatever the approach used, 
however, similarity matrices are inherently noisy 
because they indiscriminately weight both ran- 
dom matches and weak signals. Thus care should 
be taken to select a scoring matrix appropriate to 
the evolutionary distance at which relationships 
are being sought. For practical purposes, this 
means using a range of different matrices (though 
few people actually bother!). 

23. Profile methods 

An alternative philosophy to the motif-based 
approach of protein family characterisation 
adopts the principle that the variable regions 
between conserved motifs also contain valuable 
information. Here, the complete conserved por- 
tion of the alignment (including gaps) effectively 
becomes the discriminator. The discriminator, 
termed a profile, defines which residues are 
allowed at given positions, which positions are 
highly conserved and which degenerate, and 
which positions, or regions, can tolerate inser- 
tions. The scoring system is intricate and may 
include evolutionary weights and results from 
structural studies, as well as data implicit in the 
alignment. In addition, variable penalties may be 
specified to weight against insertions and del- 
etions occurring within core secondary structure 
elements [17,18]. Profiles provide a sensitive 
means of detecting distant sequence relationships 
where only very few residues are well-conserved. 

Just as there are different ways of using motifs 
to characterise protein families, so there are 
different ways of using domain alignments to 
build family discriminators. An extension of the 
concept of profiles lies in the application of hid- 
den Markov models (HMMs) [19]. These are 
probabilistic models consisting of a number of 
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Fig. 4. Linear hidden Markov model (HMM). Each position 
of an alignment is represented as a match (M), an insert (I), 
or a delete (D) state in the HMM. This allows a query 
sequence to be aligned by assigning the most probable state 
transition to each of its residues. 

interconnecting states — they are essentially lin- 
ear chains of match, delete or insert states that 
attempt to encode the sequence conservation 
within aligned families. A match state is assigned 
to each conserved column in a sequence align- 
ment; an insert state allows for insertions relative 
to the match states; and delete states allow pos- 
itions to be skipped. Probabilities or costs (nega- 
tive log probabilities) are associated with each 
omission and each transition between states. To 
align a sequence is to find the highest-probability 
(lowest-cost) path through the HMM. A linear 
HMM is depicted in Fig. 4. 

Although capable of providing precise descrip- 
tors for particular families, as with all methods, 
there are drawbacks. One problem arises from 
the specificity of profiles and HMMs. For 
example, they may be well trained for a given 
family, but an outlier that was not included in 
the training set may be missed if features of its 
sequence are incompatible with the model. 
Another problem relates to the automatic, itera- 
tive nature of HMM training; without adequate 
supervision, the process may include false posi- 
tive matches, which may ultimately corrupt the 
model and lead to profile dilution. 



3. Pattern databases 

As a consequence of the range of sources of 
sequence data and the variety of ways of analys- 
ing sequences and encoding protein families, a 



Table 3 

Some of the major pattern databases in common use; in each 
case, the primary source is noted, together with the type of 
pattern stored (e.g. regular expression, fingerprint, HMM, 



etc.) 


Pattern 


Data source 


Stored information 


database 






PROSITE 


SWISS-PROT 


regular expressions (patterns) 


PRINTS 


SWISS-PROT/ 


aligned motifs (fingerprints) 




TrEMBL 




Profiles 


SWISS-PROT 


gapped weight matrices 






(profiles) 


Pfam 


SWISS-PROT/ 


gapped domain alignments 




TrEMBL 


(HMMs) 


BLOCKS 


PROSITE/ 


aligned motifs (blocks) 




PRINTS 




IDENTIFY BLOCKS/ 


permissive regular 




PRINTS 


expressions (patterns) 


number 


of different 


pattern databases have 


evolved 


to house the different descriptors out- 



lined in the previous section. The databases and 
their associated methods are summarised in 
Table 3. Despite their differences, pattern data- 
bases have arisen from a common principle: i.e. 
homologous sequences share conserved motifs, 
presumably crucial to the structure or function of 
the protein, which can be used to build discrimi- 
nators for particular protein families. An 
unknown query sequence may be searched 
against a library of such descriptors to determine 
whether or not it contains any of the predefined 
characteristics and hence whether or not it can 
be assigned to a known family. If the structure 
and function of the family is known, searches of 
-pattern databases thus theoretically offer a fast 
track to the inference of biological function. 
Because these resources are derived from multiple 
sequence information, searches of them are often 
better able to identify distant relationships than 
are searches of the sequence databases. However, 
none of the pattern databases is yet complete. 
They should therefore be used to augment 
sequence searches, rather than to replace them. 
The status of some of the commonly used pattern 
resources is outlined below. 

PROSITE, the first pattern database to have 
been developed, houses motifs in the form of 
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ID BACTERIAL_0PSIN_1; PATTERN. 

AC PS00950; 

DT JUN-1994 (CREATED); JUN-1994 (DATA UPDATE) ; JUL- 1998 (INFO UPDATE) . 

DE Bacterial rhodopsins signature 1 . 

PA R-Y-x- [DT] -W-x-(LIVMF) - (ST) -T-P- (LIVM) (3) . 

NR /RELEASE=36,74019; 

NR /TOTAL=22(22) ; / POSITIVE=22 (22) ; /UNKNOWN=0 (0) ; /FALSE_POS=0 (0) ; 

NR /FALSE __NEG=1; / PARTI AL=1 ; 

CC /TAXO-RANGE=A? ? ? ? ; /MAX-REPEAT=1 ; 

DR P19585, BAC1_HALS1, T; P29S63, BAC2_HALS2, T; P96787, BAC3_HALSD, T 

DR Q48334, BAC3_HALVA , T; P33970, BACH_HALHM, T; Q48315, BACH_HALHP, T 

DR Q48314, BACH_HALHS, T; P16102, BACH_HALSP, T; P33742, BACH_HALSS, T 

DR P94853, BACH_H ALVA , T; P15647, BACH_NATPH, T; Q57101, BACR_HALAR, T 

DR P02945, BACR_HALHA, T; P33969, BACR_HALHM, T; P33971, BACR_HALHP , T 

DR P33972, BACR HALHS, T; Q53496, BACR_HALSR , T; P94854, B ACR_H ALVA , T 

DR P25964, BACslHALHA, T; P33743, BACS_HALSS, T; P71411, BACT_HALS A , T 

DR P42196, BACr3lATPH, T; 

DR Q53461, BACH_HALAR, P; 

DR P42197, BACT_H ALVA , N; 

3D IBRD; 2BRD; 1BAC; 1BAD; 1BHA; 1BHB; 1BCT; 1SR1 ; 

DO PDOC00291; 

// 



Fig. 5. Example PROSITE entry, showing the data file for the bacteriorhodopsin pattern. When viewing PROS1TE on the Web, 
accession numbers are hyperlinked, allowing direct access to the corresponding SWISS-PROT entry for each sequence matched. 
Similarly, the documentation file for a given pattern can be accessed via the hyperlinked PDOC accession number at the bottom of 
the file. 



regular expressions [20]. The process of deriving 
regular expressions first requires the construction 
of a multiple alignment and then location of the 
conserved regions. The most conserved segment 
is selected and its sequence information reduced 
to a consensus pattern, which is used to search 
SWISS-PROT [21]. Results are checked manually 
to determine how well the pattern has performed 
— ideally, there should be only true matches. 
Patterns whose diagnostic performance is com- 
promised by matching too many false positives 
are manually adjusted and SWISS-PROT is 
rescanned. The process of fine tuning is repeated 
until an optimal pattern is created. If a family 
cannot be fully characterised by a single motif, 
additional patterns are designed to encode other 
well-conserved parts of the alignment. The fine- 
tuning process is then repeated until a set of pat- 
terns is achieved that is capable of capturing all, 
or most, of the family without matching too 
many, or any, false positives. When the best pat- 
tern, or set of patterns, has been achieved, the 
results are manually annotated for inclusion in 
the database. 

Entries are deposited in PROSITE in two dis- 
tinct files: (i) a structured data file that houses 
the pattern and lists all matches in the parent 



version of SWISS-PROT, as shown in Fig. 5; (ii) 
a free-format documentation or annotation file, 
which provides details of the characterised family 
and, where known, a description of the biological 
role of the chosen motif/s and a supporting bibli- 
ography. A number of features of the data file 
are worthy of note. Apart from the identifier 
(ID) and description (DE) lines, which identify 
the characterised family, aspects of the DR and 
especially the NR lines are crucial to understand. 
The DR lines list all true (T), possible (P), false 
(F) and missed/negative (N) matches to the pat- 
tern, which results are summarised in the NR 
lines. In the example shown in Fig. 3, 22 matches 
are made to the pattern, all of which are true, 
one is possible (a fragment) and there is a single 
false negative match, BACT HALVA. Inspection 
of its sequence (e.g. by following its hyperlinked 
accession number, P42197, from this page on the 
Web) reveals that a disallowed asparagine in the 
penultimate position of the motif 
(RYVDWLLTTPLNV) is the reason for the mis- 
match. Referring back to the pattern line, we see 
that only members of the group [LIVM] are 
allowed in the last three positions of the motif 
(as denoted by [LIVM](3». The quality of a pat- 
tern can thus immediately be ascertained from 
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the NR lines, which are therefore probably the 
most important lines to inspect when first view- 
ing a PROSITE entry. In some cases, there are 
numerous false positives and false negatives (es- 
pecially for large super-families with substantial 
numbers of divergent sequences, such as G-pro- 
tein-coupled receptors, lipocalins, etc.)- Such pat- 
terns are diagnostically unreliable and are a 
limitation to the diagnostic potential of the data- 
base. PROSITE release 15 (July 1998), with 
updates to April 1999, contains 1014 entries 
characterised by 1352 patterns. The database is 
accessible for searching via the ExPASy Web ser- 
ver and is maintained collaboratively at the Swiss 
Institute of Bioin forma tics. 

BLOCKS, one of the first multiple-motif data- 
bases, is based on families already identified in 
PROSITE [22]. Here, motifs are detected auto- 
matically, using first, a modification of an algor- 
ithm that initially locates three conserved amino 
acids [23] and second, a motif-finding algorithm 
that searches for the highest scoring set of blocks 
that occur in the correct order without overlap- 
ping. Blocks found by both methods are con- 
sidered reliable and are calibrated against 
SWISS-PROT to obtain a measure of the likeli- 
hood of a chance match. The calibrated blocks 
are then concatenated into the BLOCKS data- 
base. An indication of the diagnostic power of a 
block is given in terms of a strength value — 
strong blocks are more effective than weak 
blocks (strength less than 1100) at separating 
true positives from true negatives. In searching 
the database, however, more important than the 
strength of individual blocks is the number of 
blocks matched. High -scoring matches to individ- 
ual blocks seldom have biological significance; 
conversely, matches to sets of blocks from the 
same family are unlikely to have arisen by chance 
(provided they match in the correct order with 
appropriate distances between them) and a prob- 
ability value is calculated to reflect that likeli- 
hood. Release 11.0 contains 4034 blocks, 
representing 994 groups from PROSITE 15. 

Recently, several other BLOCKS databases 
have been made available. For example, in 
BLOCKS + , supplementing the entries derived 
from PROSITE are blocks from families in 



PRINTS that are not already in BLOCKS and 
then successively, any additional blocks from 
Pfam, ProDom and DOMO. BLOCKS + is thus 
comprehensive, containing 9498 blocks from 
2129 sequence groups. Complementing this 
resource is a version of PRINTS in which block- 
scoring methods have been exploited [22]. 
PRINTS' motifs tend to be deeper than those in 
BLOCKS because its source database is larger; 
the diagnostic performance of entries in the two 
resources can therefore differ, BLOCKS-format- 
PRINTS tending to be more prone to problems 
of noise. Because the BLOCKS databases are de- 
rived automatically, their entries are not anno- 
tated, but links are made to the corresponding 
PROSITE and PRINTS documentation files. The 
databases are accessible for searching via the 
Web server at the Fred Hutchinson Cancer 
Research Center in Seattle. 

PRINTS, another of the early responses to the 
diagnostic limitations of regular expression 
matching, is based on the method of fingerprint- 
ing [24]. This approach uses groups of conserved 
motifs to build diagnostic signatures of family 
membership. The process involves manual cre- 
ation of a seed alignment, and location and exci- 
sion of conserved motifs for searching SWISS- 
PROT and TrEMBL. Results are examined to 
determine which sequences have matched all the 
motifs in the fingerprint; if there are more 
matches than were in the initial alignment, the 
additional information from these new sequences 
is added to the motifs and the database is 
searched again. This iterative process is repeated 
until no further complete fingerprint matches can 
be identified. The results are then annotated 
manually (with descriptions of the family, details 
of the structural or functional relevance of the 
motifs where known, cross-references to related 
databases, bibliographic references, etc.) prior to 
inclusion in the database. 

Fingerprint diagnostic performance is indicated 
via a summary that lists how many sequences 
matched all the motifs and how many made only 
partial matches (i.e. failed to match one or more 
motifs). The fewer the partial matches, the better 
the fingerprint. The full potency of the method 
derives from the mutual context provided by 
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motif neighbours. The more motifs in a finger- 
print, the better able it is to identify distant rela- 
tives, even when parts of the signature are 
absent; conversely, the fewer the motifs, the 
poorer the diagnostic performance. Fingerprints 
with only two motifs are diagnostically little bet- 
ter than single motifs and are therefore more 
likely to make false positive matches. When 
searching PRINTS, probability and expect values 
are calculated to assign a measure of confidence 
to both complete and partial matches. 

Within PRINTS, motifs are encoded as un- 
gapped, un-weighted local alignments. An im- 
portant consequence of storing the motifs in this 
'raw' form is that, unlike with regular expressions 
or other abstractions, no sequence information is 
lost. Different scoring methods may thus be 
superposed onto the motifs, conferring different 
scoring potentials, and hence different perspec- 
tives, on the same data. PRINTS may therefore 
provide the raw material for other pattern data- 
bases. PRINTS release 23.0 (June 1999) contains 
1160 entries (6938 motifs), currently making it 
the most comprehensive manually annotated pat- 
tern database. The database is accessible for 
searching via the Web server in the School of 
Biological Sciences at the University of 
Manchester. 

IDENTIFY is derived automatically from 
motifs in BLOCKS and PRINTS [12]. The pro- 
gram used to create the database constructs con- 
sensus expressions from the motifs, adopting a 
permissive approach in which different residues 
are tolerated according to a set of prescribed 
groupings (Table 2). These groups correspond to 
various biochemical properties, theoretically 
ensuring that the resulting expressions have sensi- 
ble biochemical interpretations. However, as 
mentioned earlier, in practice this approach may 
lead to an increase in noise. When searching the 
resource, different levels of stringency are there- 
fore offered from which to infer the significance 
of matches, rendering the approach diagnostically 
more powerful than exact pattern matching 
(which only offers match/no-match diagnoses). 
IDENTIFY is accessible from the Web server in 
the Department of Biochemistry at the 
University of Stanford. 



Profiles are discriminators distilled from 
sequence information in complete domain align- 
ments. As a result of their potency, they are used 
to complement some of the poorer regular ex- 
pressions in PROSITE, or to provide a diagnos- 
tic alternative where extreme sequence divergence 
renders the use of regular expressions inappropri- 
ate. A compendium of profiles has been created 
at the Swiss Institute for Experimental Cancer 
Research (ISREC) in Lausanne. Each profile has 
separate PROSITE-compatible data and docu- 
mentation files. This allows results that have 
been validated and annotated to an appropriate 
standard to be made available as an integral part 
of PROSITE [20]. As before, diagnostic perform- 
ance can be ascertained from the DR and NR 
lines. Profiles are less prone to make false 
matches than are regular expressions, but the 
numbers released via PROSITE are only small 
(48 in July 1998). Nevertheless, profiles that have 
not yet achieved the necessary standard of vali- 
dation and annotation (241 to date) are available 
for searching via ISREC's Web server. 

Pfam is a collection of HMMs for a range of 
protein domains [25]. The resource is based on 
two distinct classes of alignment: hand-edited 
seed alignments, which are deemed to be accu- 
rate; and an automatically clustered set derived 
from ProDom families. The seed alignments are 
used to build HMMs, to which sequences are 
automatically aligned to generate final full align- 
ments. If the initial alignments do not produce 
diagnostically sound HMMs, the seed is 
improved and the gathering process iterated until 
a good result is achieved. The methods that ulti- 
mately generate the best full alignment may vary 
for different families, so the parameters are saved 
to allow results to be reproduced. The collection 
of seed and full alignments, coupled with mini- 
mal annotations (often no more than a descrip- 
tion line), related database and literature cross- 
references and the HMMs themselves, constitute 
Pfam-A. All sequence domains that are not 
included in Pfam-A are automatically clustered 
and deposited in Pfam-B. Although the methods 
and parameters used to create the full automatic 
alignment are noted, no indication is given of the 
diagnostic performance of a given HMM. Direct 
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Table 4 

Some of the major alignment databases; in each case, the pri- 
mary source is noted, together with the level of information 
stored (i.e. whether domain, family or super-family align- 
ments) 



Alignment 
database 


Primary 
source 


Stored information 


ProDom 


SWISS- 


domains 




PROT 




SBASE 


swiss- 


domains 




PROT 




ProtoMap 


swiss- 


families 




PROT 




PIR-ALN 


PIR 


super-families, families 






and domains 


PROT-FAM 


PIR 


super-families, families 






and domains 


ProClass 


SWISS- 


super-families, families 




PROT/PIR 


and domains 


DOMO 


SWISS- 


domains and repeats 




PROT/PIR 




PIMA 


Entrez 


domains 



visualisation of the final alignment is therefore 
probably the best indicator of how sound its 
HMM is likely to be. Pfam is accessible for 
searching via the Sanger Centre Web server; 
release 4.1 (July 1999) encodes 1488 domains. 

4. Alignment and family-related databases 

In addition to the range of pattern resources 
described above, several alignment databases are 
also available for searching via the Web. The 
construction of alignment and pattern databases 
is based on different principles, so the two types 
of resource should not be confused. The main 
difference between them is that alignment data- 
bases tend to be derived simply by automatic 
clustering of sequence databases. This allows 
them to be more comprehensive than pattern 
resources, because they do not depend on manual 
crafting of family discriminators. However, 
searches of alignment databases are often less 
sensitive because they are usually based on im- 
plementations of BLAST. Some well-known 
alignment resources are listed in Table 4. 

ProDom is an automatic compilation of 'hom- 



ologous' domains [26] created via a procedure 
based on PSI-BLAST [7]. Version 99.1 contains 
44,345 domains with at least 2 sequences, of 
which 2652 are linked to the Protein DataBank 
(PDB) [27]. A recent addition to the resource is 
ProDom-CG, a compendium of domains built 
from complete genome data. The database is 
accessible for interrogation with the Sequence 
Retrieval System (SRS) [28] and for BLAST 
searching via the Web server of the Institut 
National de la Recherche Agronomique. 
Emphasis has been placed on the graphical user 
interface, which facilitates analysis of protein re- 
lationships. However, being automatically de- 
rived, no annotations or validations are provided 
and although links are made to the PDB for 
~5% of entries, these are generic links from the 
constituent sequences rather than from the 
domains themselves. Discovering the biological 
meaning of domains can thus be difficult, invol- 
ving extensive cross-checking with other 
resources. 

SBASE is a library of domain sequences de- 
rived from structural and functional segments 
annotated in SWISS-PROT, PIR or the literature 
[29]. Entries are grouped on the basis of standard 
names and further classified on the basis of 
BLAST similarity. The resource, which was 
developed to assist domain recognition, is main- 
tained collaboratively by the International Center 
for Genetic Engineering and Biotechnology 
(ICGEB), Trieste, Italy and the ABC Institute 
for Biochemistry and Protein Research, Godollo, 
Hungary. SBASE is accessible for BLAST 
searching via the ICGEB Web server; version 6.0 
(October 1998) contains 1038 groups. 

ProtoMap classifies sequences in SWISS-PROT 
into groups of related proteins [30]. Clustering is 
effected at different levels of confidence, resulting 
in a hierarchical organisation that divides the 
sequences into well-defined groups, which mostly 
correlate with biological families and superfami- 
lies. The resource was designed to help reveal re- 
lationships between families and to facilitate the 
detection of sub-families. ProtoMap release 2.0 
(July 1998) provides a classification of 72,623 
sequences. The resource is accessible for search- 
ing via the Hebrew University Web server. 
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PIR-ALN is a database of annotated protein 
sequence alignments derived automatically from 
the PIR-International PSD at the National 
Biomedical Research Foundation in Washington 
[31]. The database includes alignments at super- 
family, family and so-called homology domain 
levels. Sequences are grouped in the same super- 
family if they are similar from end to end; super- 
families are further subdivided into families con- 
taining sequences that are 45% identical; and 
segments corresponding to the same domain in 
two or more super-families are the basis of 
domain alignments. All domain alignments are 
deposited in the DOMAINDB database, which is 
used to screen new sequences for already defined 
domains. The March 1999 release of PIR-ALN 
contains 3983 alignments, including 1480 super- 
family and 371 domain alignments. The resource 
can be queried with the ATLAS information 
retrieval system at the PIR Web site. 

PROT-FAM is based on an automatic cluster- 
ing of the PIR-International PSD at the Munich 
Information Center for Protein Sequences 
(MIPS) [32]. Sequences that share 50% identity 
are clustered into families, and families are 
further clustered into super-families if they share 
~30% identity. Sequences are assigned to the 
same family if they are similar from N- to C-ter- 
minus, while regions showing ~30% identity that 
do not cover the full sequence length are anno- 
tated as domains. Domains are deposited into 
the HOMDOM database, which is used to 
characterise new sequences by means of the pre- 
defined domains. For all families, super-families 
and domains that contain more than one 
sequence, alignments are created using PILEUP 
[33]. The September 1998 release of PROT-FAM 
included 6000 families with two sequences and 
~6500 families containing three or more; ~3800 
super-families derived from more than one 
family; and 361 domains. These are available for 
querying via the MIPS Web site. 

ProClass is a value-added database built upon 
the PIR-International PSD, PROSITE and 
SWISS-PROT [34]. It organises nonredundant 
SWISS-PROT and PIR sequences according to 
relationships defined collectively by PIR super- 
families and PROSITE patterns. By combining 



global similarities and motifs into a single classifi- 
cation scheme, ProClass was designed to facili- 
tate identification of domain and family 
relationships, and classification of multidomain 
proteins. ProClass release 4.0 (September 1998) 
contains 122,253 sequence entries, -60% of 
which are classified into -3500 families. The 
resource is available for searching from the PIR 
Web server. 

DOMO is a database of 'homologous' domain 
alignments computed automatically from a non- 
redundant amalgam of SWISS-PROT and PIR 
[35]. The domains have been compiled in FASTA 
format to permit fast searching using BLAST 
and sequence alignment using CLUSTALW [36]. 
The resource was designed as an aid to determine 
domain arrangements, their evolutionary re- 
lationships and their key conserved amino acids. 
DOMO can be queried via SRS at the 
Infobiogen Web site. Release 1.2 (April 1998) 
contains 99 s 058 domains clustered into 8877 
sequence alignments. Query results are linked to 
other databases to provide complementary infor- 
mation on related proteins and their families. 
Where 3D structures of representative sequences 
are known, links to the atomic coordinates and 
structure classification resources are provided. If 
the domain structure is unknown, pointers are 
given to a composite secondary structure predic- 
tion obtained from a variety of different tech- 
niques. As with other automatically generated 
resources, the structure links are generic and do 
not relate directly to the domains themselves; 
understanding their biological significance can 
therefore be difficult. 

PIMA is a collection of conserved motifs gen- 
erated by clustering the NCBFs Entrez database 
[37]. For families of two or more sequences, 
alignments are created using the pattern-induced 
multiple alignment program [38] and these are 
scanned for the presence of conserved regions. If 
an alignment contains one or more such el- 
ements, additional alignments are created by 
excision of these conserved segments. Currently, 
the PIMA database includes 22,416 alignments, 
each of which contributes a single pattern to the 
resource; it is available for searching with modi- 
fied versions of FASTA via the Baylor College of 
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Medicine Search Launcher Web pages. Here, 
another database has been created by extracting 
the locations of all annotated domains and sites 
from sequences contained in the Entrez, 
PROSITE, BLOCKS and PRINTS databases. 
The BEAUTY utility incorporates this infor- 
mation directly into BLAST search results [39]; 
for each match, a schematic display allows direct 
comparison of the locations of conserved regions. 



5. Which database is best? 

The plethora of available databases presents 
bewildering choices to the would-be sequence 
analyst. Which is diagnostically most reliable? 
Which has the most useful annotations? Which is 
the most comprehensive? Which should I use? At 
first sight, the alignment resources appear to be 
the most comprehensive. But they are largely 
based on automatic clustering of sequence data- 
bases and their search tools thus tend to involve 
flavours of BLAST or FASTA, which are less 
sensitive than searches of family-specific patterns. 
It is difficult to assess the quality of particular 
resources and it would be invidious to try. Each 
has different diagnostic strengths and weaknesses, 
each offers different family coverage and different 
levels of annotation — each has certain merits 
and demerits. Nevertheless, some general points 
bear consideration. 

Automatically generated databases carry no 
annotations. The advantage of searching them is 
that they are more comprehensive than their 
manually derived counterparts. The disadvantage 
is that there may be no way to ascertain the bio- 
logical significance of a match, if indeed it has 
any (that a match has been made does not mean 
an evolutionary relationship necessarily exists). 
This is important to understand in light of 
resources that house homology domains' — auto- 
matic methods detect similarities, but it is for the 
user to infer homology from supporting biologi- 
cal evidence. Related issues arise in resources 
that calculate evolutionary trees from their auto- 
matically created alignments; if levels of strin- 
gency are sufficiently high, alignments and their 
trees may be sound; but at low stringency, results 



are likely to be error prone and relationships 
should be inferred with caution. 

Amongst pattern databases, single-motif 
methods that rely on exact regular expression 
pattern-matching have diagnostic limitations; 
such methods tolerate no similarity, so will fail 
to diagnose sequences that contain subtle changes 
not catered for by the pattern. Moreover, single 
motifs offer no biological context within which to 
assess the significance of a match. Multiple-motif 
approaches inherently offer improved diagnostic 
reliability by virtue of the mutual context pro- 
vided by motif neighbours. Thus, if a query fails 
to match all the motifs in a signature, the pattern 
of matches formed by the remaining motifs still 
allows the user to make a confident diagnosis. 

Pattern resources derived from existing data- 
bases have the limitation that they offer no 
further family coverage. Nevertheless, they have 
the advantage of implementing different analyti- 
cal methods from their source databases, thus 
offering different scoring potentials on the same 
data and furnishing important opportunities to 
diagnose relationships missed by the original im- 
plementations. 

Finally, manually annotated databases are set 
apart from their automatically created counter- 
parts by virtue of (i) providing validation of 
results and (ii) offering detailed information that 
helps to place conserved sequence information in 
structural or functional contexts. This is vital for 
the user, who not only wants to discover whether 
a sequence has matched a predefined motif, but 
also needs to understand its biological signifi- 
cance. 



6. Composite pattern databases 

If, today, comprehensive sequence analysis 
requires accessing a variety of disparate data- 
bases, gathering the range of different outputs 
and arriving at some sort of consensus view of 
the results, in the future this process should 
become more straightforward. The curators of 
PROSITE, Profiles, PRINTS, Pfam and ProDom 
are currently creating a unified database of pro- 
tein families, termed InterPro. The aim is to pro- 
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vide a single family annotation resource, based 
on existing documentations in PROSITE and 
PRINTS and on the minimal annotations pro- 
vided in Pfam. Each InterPro family will link to 
different entries in its satellite pattern databases. 
This will simplify sequence analysis for the user, 
who will thereby have access to a central resource 
for protein family diagnosis. 

This effort is supported by the curators of the 
BLOCKS databases, who, realising the problems 
associated with providing detailed family docu- 
mentation, are developing a dedicated protein 
family Web site, termed pro Web [40]. This facil- 
ity provides information about individual families 
via links to existing Web resources maintained by 
researchers in their own fields. Pro Web can facili- 
tate the task of annotators by providing con- 
venient access to family information and 
obviating the need for annotators themselves to 
become 'expert' on all proteins. 



7. Conclusion 

Creating and searching pattern databases are 
activities that lie at different ends of a fallible 
chain of events. We begin with a sequence align- 
ment, we create some kind of scoring function to 
encode the conservation within the alignment (a 
scoring matrix, HMM, etc.), we store the discri- 
minators in a database and we search them with 
different algorithms. Problems arise if unrelated 
sequences have crept into the alignment, which in 
turn lead to errors in the discriminators, which 
then give ambiguous or incorrect search results. 
Alternatively, the discriminators may be sound, 
but the search algorithms may not be sufficiently 
sensitive to allow unequivocal diagnosis, leading 
the user to false conclusions of family ties. If the 
user has performed this experiment on a newly 
determined sequence and submits the results to 
one of the sequence databases, the annotation 
error becomes available for mass propagation. 

Recently, there has been doom-mongering in 
the literature about the quality of our databases, 
some harbingers of misfortune predicting a future 
error catastrophe. At the same time, claims of 
success for some approaches to family classifi- 



cation and function prediction have been equally 
overdone. A more balanced view recognises that 
our databases and search routines are not per- 
fect, but with the right approach we can avoid 
the pitfalls of jumping to over-pessimistic or 
over-zealous conclusions. 

Until we have sufficient experimental data 
available, pattern and sequence databases are 
probably the best tools we have for accessing the 
functional and evolutionary clues latent in the 
sequences flooding from the genome projects. 
Pattern databases offer several benefits: (i) by dis- 
tilling multiple sequence information into family 
descriptors, trivial errors in the underlying 
sequences may be diluted; (ii) annotation errors 
may be quickly spotted if the description of one 
sequence differs from that of its family; and (iii) 
they allow specific diagnoses, placing individual 
sequences in a family context for a more 
informed assessment of possible function. By 
contrast, searches of sequence databases tend to 
reveal only generic similarities, making precise 
pinpointing of a particular biological niche more 
difficult. 

While there is some overlap between them, the 
contents of the pattern databases differ. Together 
they encode ~2000 families, including globular 
and membrane proteins, modular polypeptides 
and so on. It has been estimated that the total 
number of families might be in the range 1000 to 
10,000, so there is a long way to go before any 
of the databases can be considered complete. 
Thus, in building a search strategy, it is good 
practice to include all available pattern resources, 
to ensure that the analysis is as comprehensive as 
possible and that it takes advantage of a variety 
of search methods. Where there is consensus, 
diagnoses can be made with greater confidence. 

Unfortunately, creating and annotating family 
descriptors is time-consuming, so pattern data- 
bases have not kept pace with the deluge of 
sequence data. Consequently, by comparison 
with the sequence repositories, they are still very 
small. Nevertheless, as they become more com- 
prehensive, as the volume of sequence data 
expands and search outputs become more com- 
plex, their diagnostic potency ensures that pat- 
tern databases will play an increasingly 
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important role as the post-genome quest to 
assign functional information to raw sequence 
data gains pace. 
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atrophy), and it is likely that others have 
yet to be found. A search of the working 
draft sequence yielded 286 potential 
paralogs of the 971 known human 
disease genes with entries in OMIM and 
either SwissProt or TREMBL protein 
databases. A similar screen of 603 classic 
drug target proteins identified 1 8 new 
potential paralogs. Together, these 
groups offer an intriguing collection of 
candidates - genes that might cause 
related disorders when mutated or that 
might encode new targets for drug 
screens. 

Our understanding of disease 
mechanisms might also lead to the 
identification of new therapeutic targets. 
Profiling gene-expression changes in 
biological systems that model disease 
might lead to the identification of 
pathways that play a crucial role in 
pathogenesis. Such an endeavor is 
currently under way for the polyglutamine 
expansion diseases. Furthermore, 
consistent genetic changes that occur in 
easily accessible tissues in model 
organisms might provide surrogate 
markers for drug screens. In addition, an 
understanding of the common 
polymorphisms that occur in drug target 
proteins might help predict which patients 
will respond appropriately to therapy. 

What lies ahead? 

With a bounty of information being served 
up, it is important to keep in mind both the 
many strengths and the limitations of the 



current data set. The working draft of the 
human genome that is accessible in publicly 
available databases includes almost one 
billion base pairs of finished sequence. 
However, nearly 75% of B ACs are 
unfinished, currently consisting of as many 
as 1 0-20 unassembled sequence fragments 
each. Unfinished, unassembled sequence 
presents difficulties during gene mapping, 
might contain contamination and might be 
inadvertently assembled to create artificial 
duplications or deletions. Over the coming 
year, it is hoped that full coverage (8- 1 0-fold 
redundancy) will be achieved for clones 
spanning the entire physical map, followed 
shortly thereafter by finished sequence. At 
that point, >96% of the euchromatic human 
genome will be in the database. Closing the 
remaining gaps, which might contain 
biologically important information, will 
require screening additional large-insert 
libraries, a process that is anticipated to take 
until 2003. Finally, new techniques might be 
needed to close recalcitrant gaps and to 
generate sequence from heterochromatic 
regions that probably contain highly 
polymorphic tandem repeats. 

While we eagerly await completion of 
the finished genome sequence, our ability 
to mine the information we seek is rapidly 
evolving. Gene prediction, or annotation, 
is much more difficult in humans than in 
the fly, worm or yeast as a result of the 
large size of the genome. New computer 
algorithms and high-throughput 
techniques for gene identification and 
verification will be needed. Comparisons 



with the genomes of other vertebrates will 
probably speed up this process and might 
reveal conserved regulatory regions that 
control the expression of orthologous 
genes. The Mammalian Gene Collection 
Project aims to assemble a comprehensive 
collection of full-length human cDNAs, 
providing a valuable resource for those 
studying gene function. Furthermore, by 
extending the SNP data set to include all 
common variants, the identification of 
disease genes and genetic modifiers 
should be greatly facilitated. This hope - 
of using the genome to help define causes 
and cures for human disease - underlies 
much of the excitement surrounding the 
release of the working draft. 
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I Techniques & Applications 

A compendium of specific motifs for diagnosing GPCR 
subtypes 

Teresa K. Attwood 



Analysis of G-protein-coupled receptor 
(GPCR) subtypes has attracted considerable 
interest because some drugs that act on 
GPCRs cause therapeutic problems as a 
result of their failure to differentiate 
between subtypes. In this article, an 
extensive compendium of diagnostic 
'fingerprints' for GPCR subtypes and their 
families will be described. These 
fingerprints offer new opportunities to 
investigate correlations between specific 
sequence motifs and ligand binding or 



G-protein coupling, and are likely to prove 
valuable both in seeking novel receptors in 
genome data and in the characterization of 
orphan receptors. 

G-protein-coupled receptors (GPCRs) 
constitute a vast group of cell-surface 
proteins that includes hormone, 
neurotransmitter, growth factor, light and 
odorant receptors. Approximately 2000 
members populate -50 families within the 
rhodopsin-like superfamily, accounting for 



- 1 % of the vertebrate genome 1 . With so 
many GPCRs known, and perhaps 
hundreds awaiting discovery in the 
human genome, these receptors are of 
interest to the pharmaceutical industry 
because of the opportunities they afford 
for yielding novel drug targets 1-4 . 

More than 50% of prescription drugs act 
on GPCRs; however, some have efficacy 
problems and limiting side-effects because 
the compounds do not differentiate 
between receptor subtypes. There is thus 
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Box 1. Identification of GPCRs using pattern databases 



Protein pattern databases are becoming 
increasingly valuable as diagnostic 
resources that complement the ubiquitous 
sequence similarity search tool BLAST. 
Pattern databases house characteristic 
family signatures, which are encoded in 
different ways within the different 
resources: some encode single motifs (e.g. 
PROSITE patterns); others use groups of 
motifs in the form of fingerprints (e.g. 
PRINTS); and others encode virtually the 
full family alignment (e.g. PROSITE 
profiles and Pfam). Because the 
underlying analysis methods are different, 
inevitably the databases have different 
diagnostic strengths and weaknesses. It is 
therefore instructive to compare the 
results of searching a range of these 
resources using the same query sequence. 
A convenient way of doing this is to use 
the InterPro interface at 
http://www.ebi.ac.uk/interpro/scan.html. 
The graphical output in Fig. I shows the 
result of searching PROSITE patterns, 
PROSITE profiles, Pfam and PRINTS with 
the human muscarinic acetylcholine M, 
receptor. ACM1_HUMAN. 

As shown, PROSITE patterns encode 
only single short motifs (yellow), whereas 
PROSITE profiles (orange) and Pfam (blue) 
utilize almost the complete sequence. By 
contrast, PRINTS fingerprints (green) 
encode groups of motifs that differentiate 
between regions of sequence that 
characterize the superfamily (sO and those 



that characterize the family (0 and receptor 
subtype (st). Thus, it is evident from the 
comparison that although PROSITE 
patterns, PROSITE profiles and Pfam only 
furnish superfamily diagnoses, PRINTS 
provides a more fine-grained result. The 
detail conferred by a fingerprint match 
lends PRINTS a significant part of its 
diagnostic power. Using PRINTS, we can 
see immediately that the superfamily 
fingerprint encodes seven motifs 
[hyperlinks to the database confirm that 
these are the transmembrane (TM) 
domains], whereas the family and receptor 
subtype fingerprints comprise different 
parts of the terminal, loop and TM regions. 
The mutual context of motif neighbours 
within a fingerprint offers a unique 
diagnostic advantage. By contrast with the 
'pin-point' matches of PROSITE patterns 
and the 'blanket' matches of PROSITE 



profiles and Pfam, PRINTS motifs explicitly 
capture, and map, functionally and 
structurally important biological features. 
This is valuable for several reasons: (1 ) in 
analyses of uncharacterized genome data, 
fingerprints are not limited to superfamily- 
level diagnoses, but provide sufficient 
depth to be able to pinpoint particular 
receptor subtypes, thereby facilitating the 
identification of novel receptors (M.D.R. 
Croning and T.K. Attwood, unpublished); 
(2) by storing motifs that differentiate 
between families and between receptor 
subtypes, correlations with specific 
residues involved in ligand-binding and 
G-protein coupling can be investigated; 
and hence (3) such fine-tuning, and the 
explicit encoding of motifs involved in 
ligand-binding, yields greater promise for 
our future ability to characterize orphan 
receptors. 
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considerable interest in attaining 
therapeutic selectivity by identifying the 
single receptor subtype that affects a 
particular physiology. The goal is to be able 
to design drugs without, or at least with 
Jess, side-effects, while retaining the 
desired function. Muscarinic agonists, for 
example, gained attention in research into 
Alzheimer's disease following the 
realization that the cardiovascular and 
gastrointestinal side-effects of 
nonselective muscarinic agonists could be 
avoided (i.e. muscarinic acetylcholine Mj 
receptors in the brain might be involved in 
cognition, whereas other muscarinic 
receptor subtypes regulate heart and 
gastrointestinal functions 5 ). 

Identification of GPCRs 

Routinely, computational strategies for 
identifying GPCRs tend to involve 



searches of sequence databases [e.g. using 
standard tools such as BLAST (Ref. 6)] 
and sometimes also of so-called 'pattern* 
databases, which house diagnostic protein 
family 'signatures' (Box 1). However, it is 
apparent that BLAST 'sees' similarity 
between pairs of sequences in a rather 
limited way: it reveals generic similarities 
(e.g. it can show that the sequences being 
compared share several hydrophobic 
regions) but it cannot recognize individual 
family traits 7 (i.e. it cannot distinguish the 
differences between the sequences, such 
as specific ligand-binding motifs). 
Similarly most pattern databases tend to 
provide generic signatures that are only 
capable of diagnosing superfamily 
relationships. Thus, these databases 
might recognize that a sequence belongs 
to the rhodopsin-like GPCR superfamily, 
but they cannot offer insights into the 



particular family to which it belongs. For 
researchers interested in, for example, the 
treatment of obesity and wishing 
specifically to identify type 4 melanocortin 
receptors (which are important in 
regulating appetite and body weight), a 
superfamily-level diagnosis is of limited 
value. Therefore, it seemed that it might 
be advantageous to develop a more fine- 
grained analytical approach for detecting 
GPCRs. 

Identification of specific receptor subtypes 

To facilitate the identification of 
particular subtypes, a systematic 
analysis of GPCRs was undertaken. 
Sequence alignments were created 
manually 8 for each of the different 
superfamilies and for their families and 
receptor subtypes. Regions of similarity 
and differences between alignments 



hup J/Ups. trends. corn 



154 n^^^mmJ253 TRENDS in Pharmacological Sciences Vol.22 No.4 April 2001 



(a) 









J Fingerprint 


| E-valuc 


[GRAPHScan 1 








J) i^t^Mt^ 








II §§&m&m 





(b) 




GPCRRHODOPSN , 

MUSCARINICR 

MUSCRINICM1R -~- 



TRENDS in Pharmacological Sciences 



Fig. 1. (a) Hierarchical diagnosis returned from a PRINTS fingerprint search with the human muscarinic acetylcholine 
M, receptor, ACM1_HUMAN (the search is effected simply by pasting the full sequence, its identifier or its accession 
number into the Web form at http://bioinf. man.ac.uk/cg i-bin/dbbrowser/fingerPRINTScan/muppet/FPScanxgi). The 
result shows that three fingerprints have been matched, indicating that the sequence is likely to be a member of the 
rhodopsin-like G-protein-coupled receptor (GPCR) superfamily (fingerprint GPCRRHODOPSN). belonging to the 
muscarinic receptor family (MUSCARINICR) and being specifically an M, receptor subtype (MUSCRINICM1 R). The 
E- values in the centre of the table provide the measure of confidence in the results (E-values indicate the number of 
matches one would expect to see by chance: the smaller the number, the more likely the matches are to be 
biologically meaningful). Here, the results are all statistically significant (i.e. above the threshold value of 1 CM), 
(b) From left to right, the results in (a) are mapped in three dimensions onto a crude model and are illustrated 
schematically below. The coloured bars denote the relative locations and lengths of the constituent motifs within 
each fingerprint. The different regions that characterize the receptors at each level are clearly evident motifs in the 
superfamily fingerprint encode each of the seven TM domains; those in the family fingerprint encode parts of TM and 
loop regions (here. TM domains 1 , 3. 4. 5 and 7, the second cytoplasmic, and second and third external loops), the 
motifs mostly clustering around the I igand -binding domain; and motifs in the subtype fingerprint are drawn from the 
third cytoplasmic loop and the N- and C-terminal domains (not shown in 30), areas known to be involved in 
regulating the selectivity and intensity of G -protein coupling 1 . 



were then located and used to build a 
range of discriminatory 'fingerprints'. 
Fingerprints are groups of conserved 
motifs that together provide a signature 
of family membership {motifs tend to 
reflect functionally or structurally 
important regions within a protein 
family [e.g. transmembrane (TM) 
domains, protein-protein interaction 
sites, ligand-binding sites, and so on], 
thereby characterizing the families in 
which they are found}. For the purposes 



of this analysis, within superfamilies the 
motifs encoded the only features 
common to all members (i.e. the scaffold 
of seven TM domains) 910 . Conversely, at 
the family level, the motifs focused on 
those regions that characterized the 
particular family, but distinguished it 
from the parent superfamily; 
predictably, these were usually small 
parts of TM and loop regions. For 
receptor subtypes, the distinguishing 
traits were largely present in the N- and 



C-terminal regions, and in the third 
cytoplasmic loop. 

To date, >200 GPCR-specific 
fingerprints have been created and made 
available as an integral component of the 
PRINTS fingerprint database 11 
(http://www. bioinf.man.ac.uk/dbbrowser/ 
PRINTS/prin tscontents.html#Recep tors) . 
By searching PRINTS with a given query, 
it is thus possible to make a hierarchical 
diagnosis, indicating to which superfamily 
and family the sequence belongs and 
which subtype it most resembles, as 
illustrated for the human M l receptor in 
Fig. la. 

Biological significance of receptor motifs 

To gain a deeper insight into the biological 
relevance of these database matches, the 
results can be rationalized in three 
dimensions by mapping the constituent 
motifs of the different fingerprints onto a 
crude model 12 . For these purposes, an old 
model based on the structure of 
bacteriorhodopsin 13 was used. Knowing 
that this was unlikely accurately to 
represent a GPCR (Ref. 14), our aim was 
simply to help visualize the relative 
three-dimensional (3D) locations of the 
motifs, rather than to ascertain precise 
atomic positions. As shown in Fig. lb, the 
superfamily fingerprint encodes the 7TM 
scaffold, providing the architectural blue- 
print for all members; the family 
fingerprint focuses on the loop regions 
and on specific portions of the TM 
domains; and the subtype fingerprint is 
drawn from the third cytoplasmic loop 
and the N- and C-terminal domains. This 
is consistent with our expectation that 
portions of the TM segments are likely to 
constitute the ligand-binding domain, 
whereas the large intracellular region, 
unique to each subtype, is likely to 
constitute the effector-coupling domain 15 . 

Similar results can be visualized for 
all the GPCR families housed in PRINTS, 
either using the fingerPRINTScan suite 16 
(Fig. la) or the BLAST PRINTS server 17 , 
both of which are accessible from the 
PRINTS home page (http:// 
vv^vw.bioiru c .man.ac.uk/dbbrowser/PRINTS). 
Alternatively, a powerful new resource 
that allows comparison of results from 
searches of PRINTS, PROSITE (Ref. 18) 
and Pfam 19 is the integrated database of 
protein families, domains and functional 
sites known as InterPro (Ref. 20). By 
means of the graphical output from 
InterPro's sequence search, it is possible 
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to place the fingerprint matches in 
context and see at a glance which regions 
of a sequence are matched by the 
different resources. The example 
discussed in Box 1 demonstrates the fine- 
tuning that fingerprints add to the 
diagnostic process. 

Concluding remarks 

GPCR fingerprints allow specific 
diagnoses, from the level of the superfamily 
down to the individual receptor subtype. No 
other computational approach currently 
offers such a hierarchical discriminatory 
system for this important class of receptors. 
The resource is thus a valuable 
complement to family and domain 
databases such as PROSITE and Pfam, 
offering potent diagnostic opportunities 
that have not been realised by other 
pattern-recognition methods. 
Furthermore, fingerprint selectivity offers 
new opportunities to explore in more detail 
correlations between specific motifs and 
ligand binding or G-protein coupling. With 
the availability of the first draft of the 
human genome, this collection of diagnostic 
GPCR fingerprints promises to find 
application in computational strategies to 
identify potential new drug targets and to 
characterize orphan receptors. 
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Shields, R. (2001) The emperor's new clothes. Trends Genet 17, 189 
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Scan PRINTS with a PROTEIN query sequence; using an ID code from one 
of the following databases: {SWISSPROT SPTREMBL SWISSNEW 

TREMBLNEW} or by pasting it in as a raw sequence. 
Please Note; DNA Sequences are NOT catered for in this software. 

Important information concerning the E-value calculation please read 



Please input; either an ID code, or a raw sequence: 



MGFNLTLAKLPNNELHGQESHNSGNRSDGPGKNTTLHNEF 
DTIVI^PVLYLIIFVASILLNGLAVWIFFHIRNKTSFIFY 
KNIVVADLIMTLTFPFRIVHDAGFGPWYFKFILCRYTSV 
FYANMYTSIVFLGLISIDRYLKVVKPFGDSRMYSITFTF, 

LSVCVWVIMAVLSLPNIILTNGQPTEDNIHDCSKLKSPI[ 

VKWHTAVTYVNSCLFVAVLVILIGCYIAISRYIHKSSRC 
ISQSSRKRKHNQSIRVVVAVYFTCFLPYHLCRMPSTFSHR? 
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significance of 
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Highest scoring fingerprints for your query 


Fingerprint 


E-value 


GRAPHScan 


GPCRRHODOPSN (relations) 


3.118054e- 
29 


Graphic 



for further information choose any of the following options 

• Simple - To p Ten 
• Detailed - Top Ten (detailed b y motif) 



Back to top 



Ten top scoring fingerprints for your query 


Fingerprint 


No. of 
Motifs 


Sum Id 


Aveld 


PfScore 


Pvalue 


Evalue 


GRAPHScan 


GPCRRHODOPSN 


7 of 7 


1.8e+02 


25 


1733 


1.2e- 
34 


3.1e- 
29 


iiliiii 


Graphic 






PROTEASEAR 


2 of 5 


58.16 


29.08 


460 


5.2e- 
08 


0.013 


i . . i . 


Graphic 






CXCCHMKINER4 


2 of 9 


79.69 


39.84 


696 


1.5e- 
07 


0.038 




Graphic 






DUFFYANTIGEN 


3 of 7 


61.06 


20.35 


626 


1.8e- 

06 


0.47 


i . i . . . i 


Graphic 






BRADYKININR 


2 of 6 


59.29 


29.64 


419 


4.3e- 
06 


1.1 


. Ii. . . 


Graphic 


P2Y12PRNCPTR 


2 of 3 


54.53 


27.26 


466 


1.4e- 
05 


3.6 


Ii. 


Graphic 


PAFRECEPTOR 


3 of 
11 


78.40 


26.13 


677 


1.5e- 
05 


3.9 




Graphic 






ANGIOTENSINR 


2 of 8 


103.09 


51.54 


435 


2.8e- 
05 


7.3 




Graphic 
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ACRIFLAVINRP 


2 of 9 


34.38 


17.19 


318 


5.4e- 
05 


14 
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FingerPrint Name 


Motif 

Number 


IdScore 


PfScore 


Pval 


Sequence 


GPCRRHODOPSN 


1 of 7 


23.40 


225 


8.79e- 

07 


VLPVLYLIIFVASILLNGLAVWIFF 


2 of 7 


24.21 


190 


8.19e- 

05 


FIFYLKNIVVADLIMTLTFPFR 


3 of 7 


35.51 


339 


l.lOe- 
08 


FYANMYTSIVFLGLISIDRYLKV 


4 of 7 


23.63 


251 


3.97e- 
04 


FTKVLSVCVWVIMAVLSLPNII 


5 of 7 


21.00 


134 


9.54e- 
03 


VTYVNSCLFVAVLVILIGCYIAIS 


6 of 7 


21.06 


264 


2.05e- 
04 


HNQSIRVVVAVYFTCFLPYHLCRMP 


7 of 7 


27.45 


330 


1.97e- 
07 


KEITLFLSACNVCLDPIIYFFMCRSFS 


PROTEASEAR 


1 of 5 


28.57 


236 


3.07e- 
04 


KNTTLHNEFDTIVLPVLY 


4 of 5 


29.59 


224 


1.69e- 
04 


QSIRVVVAVYFTCF 
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2 of 9 


43.75 


346 


1.31e- 
04 


HNEFDTIVLPVLYLII 


4 of 9 


35.94 


350 


1.12e- 
03 


GPWYFKFILCRYTSVL 


DUFFYANTIGEN 


1 of 7 


14.88 


146 


7.40e- 
02 


LPVLYLIIFVASILLNGLAVWTFF 


3 of 7 


20.78 


263 


2.81e- 
03 


ILCRYTSVLFYANMYTSIVFLG 


7 of 7 


25.40 


217 


8.79e- 
03 


HLDRLLDESAQKILYYCKEITLFLSAC 



http://bioinf.man.ac.uk/cgi-bin/dbbrowser/fingerPRINTScan/muppet/FPScan.cgi 



10/18/2001 



FingerPRINTScan results page 



Page 3 of 3 



BRADYKININR 


2 of 6 


33.57 


245 


5.18e- 
04 


YTSVLFYANMYTSI 


3 of 6 


25.71 


174 


8.35e- 
03 


A 1 7T OT n\ TTTT TXTO/^ 

AVLSLPNHLTNGQ 


P2Y12PRNCPTR 


1 of 3 


31.25 


208 


7.77e- 
03 


SRMYSITFTKVLSVCV 


2 of 3 


23.28 


258 


1.78e- 
03 


HKSSRQFISQSSRKRKHNQSIRVVVAVYI 


PAFRECEPTOR 


4 of 11 


21.15 


165 


8.94e- 
02 


GLISIDRYLKVVKPFGDSRMYSITFT 


8 of 11 


33.33 


332 


1.92e- 
03 


PYHLCRMPSTFSHLD 


10 of 
11 


23.91 


180 


8.91e- 
02 


FFMCRSFSRWLFKJCSMRPRSES 


ANGIOTENSINR 


1 of 8 


60.49 


253 


1.67e- 
03 


LYLIIFVAS 


4 of 8 


42.59 


182 


1.70e- 
02 


X fT O T" TTTT fXT/^ 

VLSLPNIILTNG 


CELLSNTHASEA 


5 of 9 


15.87 


165 


5.67e- 
02 


HIRNKTSFIFYLKNIVVADLIMTLTFP 


9 of 9 


23.38 


280 


9.09e- 
04 


rpx tt r\TO AT T^l tat TT 1 7"TT T /"> f~\ "\ 7"T A T 

TYVNSCLFVAVLVILIGCYIAI 


ACRIFLAVINRP 


3 of 9 


17.92 


168 


4.03e- 
03 


GKNTTLHNEFDTIVLPVLYLIIFV 


7 of 9 


16.46 


150 


1.34e- 
02 


ITFTKVLSVCVWVIMAVLSLPNII 



> USER_SEQUENCE 

MGFNLTLAKLPNNELHGQESHNSGNRSDGPGKNTTLHNEF 
DTIVLPVLYLIIFVASILLNGLAVWIFFHIRNKTSFIFYL 
KNIVVADLIMTLTFPFRIVHDAGFGPWYFKFILCRYTSVL 
FYANMYTSIVFLGLISIDRYLKVVKPFGDSRMYSITFTKV 
LSVCVWVIMAVLSLPNIILTNGQPTEDNIHDCSKLKSPLG 
VKWHTAVTYVNSCLFVAVLVILIGCYIAISRYIHKSSRQF 
ISQSSRKRKHNQSIRWVAVYFTCFLPYHLCRMPSTFSHL 
DRLLDESAQKILYYCKEITLFLSACNVCLDPIIYFFMCRS 
FSRWLFKKSNIRPRSESIRSLQSVRRSEVRIYYDYTDV 
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I he application of computational methods to DNA and protein science is a new and exciting 
development in biology. Bioinformatics: Sequence and Genome Analysis is a comprehensive 
introduction to this emerging field of study. The book has many unique and valuable features: 



I 



Essential for any biologist who wants to understand methods of sequence and 
structure analysis and how the necessary computer programs work. 



I Sequence alignment, structure prediction, phylogenetic and gene prediction, database 
searching, and genome analysis are clearly explained and amply illustrated. 

I Underlying algorithms and assumptions are clearly explained for the non-specialist. 



r 



Examples are presented in simple numerical terms rather than complex formulas and 
notation. 



I Theoretical underpinnings are linked to biological problems and their solutions. 



r 

r 



Extensive tables provide descriptions and Web sources for a broad range of publicly 
available software. 

An associated Website ( www.bioinformaticsonline.or g), accessible free of charge by 
book purchasers, provides links to Internet sources referred to in the text, as well as 
problem sets for classroom use, and other useful material not included in the text. 



Based on a well-established course given at the University of Arizona by the author, David 
Mount, this book is an ideal foundation for teaching at an undergraduate and graduate level. 
It is also highly suited for the self-instruction of investigators interested in the application of 
methods and strategies in functional genomics and for the needs of information specialists 
working in molecular biology and pharmaceutical laboratories. 
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282 - CHAPTER 7 fatf 

INTRODUCTION 

Database similarity searches have become a mainstay of bioinformatics. Large sequenc 
ing projects in which all the genomic DNA sequence of an organism is obtained hav. 
become quite commonplace. The genomes of a number of model organisms have beei 
sequenced, including the budding yeast Saccharomyces cerevisiae, the bacterium Escherichit 
coli, the worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the humai 
species Homo sapiens. These species have also been subjected to intense biological analysi. 
to discover the functions of the genes and encoded proteins. Thus, there is a good deal o 
information available as to the biological function of particular sequences in model organ 
isms that may be exploited to predict the function of similar genes in other organisms. Ir 
addition to genomic DNA sequences, complete cDNA copies of messenger RNAs tha 
carry all the sequence information for the protein products have also been obtained fo 
some of the expressed genes of various organisms. Translation of these cDNA copies pro 
vides a close-to-correct prediction of the sequence of the encoded proteins. Becaus< 
obtaining intact cDNA sequences is laborious and time-consuming, a common practice i: 
to make a library of partial cDNA sequences from the expressed genes, and then to perforn 
high-throughput, low-accuracy sequencing of a large number of these partial sequences 
known as expressed sequence tags (ESTs). The objective of an EST project is to find enough 
sequence of each cDNA and to have enough accuracy in the sequence that the amino acic 
sequence of a significant length of the encoded protein can be predicted. Overlapping ESTj 
can then be combined, and interesting ones can be found by database similarity searches 
The full cDNA sequence of these genes of interest may then be obtained. Once all the 
sequence information is collected and placed in the sequence databases, the big task at 
hand is to search through the databases to locate similar sequences that are predicted tc 
have a similar biological function through a close' evolutionary relationship. 

Sequence database searches can also be remarkably useful for finding the function oi 
genes whose sequences have been determined in the laboratory. The sequence of the gent 
of interest is compared to every sequence in a sequence database, and the similar ones are 
identified. Alignments with the best-matching sequences are shown and scored. If a query 
sequence can be readily aligned to a database sequence of known function, structure, oi 
biochemical activity, the query sequence is predicted to have the same function, structure, 
or biochemical activity. The strength of these predictions depends on the quality of the 
alignment between the sequences. As a rough rule,*if more than one-half of the amino acid 
sequence of query and database proteins is identical in the sequence alignments, the pre- 
diction is very strong. As the degree of similarity decreases, confidence in the prediction 
also decreases. The programs used for these database searches provide statistical evalua- 
tions that serve as a guide for evaluation of the alignment scores. 

Previous chapters have described methods for aligning sequences or for finding com- 
mon patterns within sequences. The purpose of making alignments is to discover whether 
or not sequences are homologous or derived from a common ancestor gene. If a homolo- 
gy relationship can be established, the sequences are likely to have maintained the same 
function as they diverged from each other during evolution. If an alignment can be found 
that would rarely be observed between random sequences, the sequences are predicted to 
be related with a high degree of confidence. The presence of one or more conserved pat- 
terns in a group of sequence is also useful for establishing evolutionary and structure-func- 
tion relationships among sequences. 

The above methods of establishing sequence relationships have been utilized in database 
searches that are summarized in Table 7.1. In addition to standard searches of a sequence 
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The Importance of (Sub)sequence Comparison in 
Molecular Biology 



Sequence comparison, particularly when combined with the systematic collection, curca- 
tion, and search of databases containing biomolecular sequences, has become essential 
in modern molecular biology. Commenting on the (then) near-completion of the effort to 
sequence the entire yeast genome (now finished), Stephen Oliver says 

In a short time it will be hard to realize how we managed without the sequence data. Biology 
will never be the same again. [478] 

One fact explains the importance of molecular sequence data and sequence comparison 
in biology. 

The first fact of biological sequence analysis 

Thefirstfact of biological sequence analysis In biomolecular sequences (DNA, RNA, 
or amino acid sequences), high sequence similarity usually implies significant functional 
or structural similarity. 

Evolution reuses, builds on, duplicates, and modifies "successful" structures (proteins, 
exons, DNA regulatory sequences, morphological features, enzymatic pathways, etc.). 
Life is based on a repertoire of structured and interrelated molecular building blocks that 
are shared and passed around. The same and related molecular structures and mechanisms 
show up repeatedly in the genome of a single species and across a very wide spectrum 
of divergent species. "Duplication with modification" [127, 128, 129, 130] is the central 
paradigm of protein evolution, wherein new proteins and/or new biological functions are 
fashioned from earlier ones. Doolittle emphasizes this point as follows: 

The vast majority of extant proteins are the result of a continuous series of genetic duplications 
and subsequent modifications. As a result, redundancy is a built-in characteristic of protein 
sequences, and we should not be surprised that so many new sequences resemble already 
known sequences. [129] 

He adds that 

... all of biology is based on an enormous redundancy [130] 

The following quotes reinforce this view and suggest the utility of the "enormous 
redundancy" in the practice of molecular biology. The first quote is from Eric Wieschaus, 
cowinner of the 1995 Nobel prize in medicine for work on the genetics of Drosophila 
development. The quote is taken from an Associated Press article of October 9, 1995. 
Describing the work done years earlier, Wieschaus says 

We didn't know it at the time, but we found out everything in life is so similar, that the same 
genes that work in flies are the ones that work in humans. 
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And fruit flies aren't special. The following is from a book review on DNA repair [424]: 

Throughout the present work we see the insights gained through our ability to look for 
sequence homologies by comparison of the DNA of different species. Studies on yeast are 
remarkable predictors of the human system! 

So "redundancy", and "similarity" are central phenomena in biology. But similarity has 
its limits - humans and flies do differ in some respects. These differences make conserved 
similarities even more significant, which in turn makes comparison and analogy very 
powerful tools in biology. Lesk [297] writes: 

It is characteristic of biological systems that objects that we observe to have a certain form 
arose by evolution from related objects with similar but not identical from. They must, 
therefore, be robust, in that they retain the freedom to tolerate some variation. We can take 
advantage of this robustness in our analysis: By identifying and comparing related objects, 
we can distinguish variable and conserved features, and thereby determine what is crucial to 
structure and function. 

The important "related objects" to compare include much more than sequence data, 
because biological universality occurs at many levels of detail. However, it is usually easier 
to acquire and examine sequences than it is to examine fine details of genetics or cellular 
biochemistry or morphology. For example, there are vastly more protein sequences known 
(deduced from underlying DNA sequences) than there are known three-dimensional pro- 
tein structures. And it isn't just a matter of convenience that makes sequences important. 
Rather, the biological sequences encode and reflect the more complex common molecular 
structures and mechanisms that appear as features at the cellular or biochemical levels. 
Moreover, "nowhere in the biological world is the Darwinian notion of 'descent with mod- 
ification' more apparent than in the sequences of genes and gene products" [130]. Hence 
a tractable, though partly heuristic, way to search for functional or structural universality 
in biological systems is to search for similarity and conservation at the sequence level. 
The power of this approach is made clear in the following quotes: 

Today, the most powerful method for inferring the biological function of a gene (or the protein 
that it encodes) is by sequence similarity searching on protein and DNA sequence databases. 
With the development of rapid methods for sequence comparison, both with heuristic al- 
gorithms and powerful parallel computers, discoveries based solely on sequence homology 
have become routine. [360] 

Determining function for a sequence is a matter of tremendous complexity, requiring biolog- 
ical experiments of the highest order of creativity. Nevertheless, with only DNA sequence it 
is possible to execute a computer-based algorithm comparing the sequence to a database of 
previously characterized genes. In about 50% of the cases, such a mechanical comparison 
will indicate a sufficient degree of similarity to suggest a putative enzymatic or structural 
function that might be possessed by the unknown gene. [9*1] 

Thus large-scale sequence comparison, usually organized as database search, is a very 
powerful tool for biological inference in modern molecular biology. And that tool is almost 
universally used by molecular biologists. It is now standard practice, whenever a new gene 
is cloned and sequenced, to translate its DNA sequence into an amino acid sequence and 
then search for similarities between it and members of the protein databases. No one today 
would even think of publishing the sequence of a newly cloned gene without doing such 
database searches. 
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The final quote reflects the potential total impact on biology of the first fact and its 
exploitation in the form of sequence database searching. It is from an article [179] by 
Walter Gilbert, Nobel prize winner for the coinvention of a practical DNA sequencing 
method. Gilbert writes: 

The new paradigm now emerging, is that all the 'genes* will be known (in the sense of being 
resident in databases available electronically), and that the starting point of biological inves- 
tigation will be theoretical. An individual scientist will begin with a theoretical conjecture, 
only then turning to experiment to follow or test that hypothesis. 

Already, hundreds (if not thousands) of journal publications appear each year that report 
biological research where sequence comparison and/or database search is an integral part 
of the work. Many such examples that support and illustrate the first fact are distributed 
throughout the book. In particular, several in-depth examples are concentrated in Chap- 
ters 14 and 15 where multiple string comparison and database search are discussed. But 
before discussing those examples, we must first develop, in the next several chapters, the 
techniques used for approximate matching and (sub)sequence comparison. 

Caveat 

The first fact of biological sequence analysis is extremely powerful, and its importance 
will be further illustrated throughout the book. However, there is not a one-to-one corre- 
spondence between sequence and structure or sequence and function, because the converse 
of the first fact is not true. That is, high sequence similarity usually implies significant 
structural or functional similarity (the first fact), but structural or functional similarity 
does not necessarily imply sequence similarity. On the topic of protein structure, F. Cohen 
[106] writes . . similar sequences yield similar structures, but quite distinct sequences 
can produce remarkably similar structures". This converse issue is discussed in greater 
depth in Chapter 14, which focuses on multiple sequence comparison. 



Most-Cited Papers of 1990-98 



Page 1 of 2 



ISIHOME | ABOUT ISI | PRODUCTS | SUPPORT | JOURNAL LISTS | CONTACT US | EMPLOY 
SEARCH | V/HArS NEW IN RESEARCH | NEWS | LANGUAGES | PRIVACY ^^^|^^^^Hj^H 



Most-Cited Papers of 1990-98 



1990: 

S.F. Altschul, etal., "Basic Local Alignment Search Tool/' J. Mol. Biol., 215:403, 1990. Citations: 
9,969 

1991: 

S. Moncada, R.MJ Palmer, E.A. Higgs, "Nitric Oxide: Physiology, pathophysiology, and pharmacology," 
Pharm. Rev., 43:109, 1991. Citations: 6,655 

1992: 

R.O. Hynes, "Integrins: Versatility, modulation, and signaling in cell adhesion," Cell, 69:11, 1992. 
Citations: 4,610 

1993: 

M.J. Berridge, "Inositol trisphosphate and calcium signaling," Nature, 361:315, 1993. Citations: 
3,446 

1994: 

ID. Thompson, D.G. Higgins, TJ. Gibson, "Clustal W: Improving the sensitivity of progressive multiple 
sequence alignment through sequence weighting, position-specific gap penalties and weight matrix 
choice," Nucl. Acids Res., 22:4673, 1994. Citations: 3,352 

1995: 

C.B. Thompson, "Apoptosis in the pathogenesis and treatment of disease," Science, 267:1456, 1995. 
Citations: 1,745 

1996: 

R.M. Barnett, et at., "Particles and fields. 1. Review of particle physics," Phys. Rev. D, 54:1, 1996. 
Citations: 1,342 

1997: 

S.F. Altschul, etal., "Gapped BLAST and Psi-BLAST: A new generation of protein database search 
programs," Nucl. Acids Res., 25:3389, 1997. Citations: 1,534 

1998: 

S.H. Landis, etal. t "Cancer statistics, 1998," CA-A Cane. J., 48:6, 1998. Citations: 605 



Source: High-Impact Papers 



ISI 

— 

THOMSON SCIENTIFIC 



webmaster(5>isinet.com 



http://www.isinet.com/isi/hot/research/200014/33 1 007/ 



9/18/2001 



Proc. Natl. Acad. Set. USA 

Vol. 88, pp. 11515-11519, December 1991 

Biochemistry 

Expression cloning of a cDNA encoding the bovine histamine 
Hi receptor 

(adrenal medulla / Xenopus oocyte/ [ 3 H] me pyramine/doxe pin) 

Masakatsu Yamashita*, Hiroyuki FuKyi*t, Kazushige Sugama*, Yoshiyuki Horio*, Seui Ito*, 

HlROYUKI MlZUGUCHI*, AND HlROSHI WADA* 

•Department of Pharmacology II, Faculty of Medicine, Osaka University, Suita 565, Japan; and 1 Department of Cell Biology, Osaka Bioscience Institute, 
Suita 565, Japan 



Communicated by Esmond E. Snetl, September 13, 1991 

ABSTRACT A functional cDNA clone for the histamine Hi 
receptor was isolated from a cDNA library of bovine adrenal 
medulla by a combination of molecular cloning in an expression 
vector and electrophysiological assay in Xenopus oocytes. The 
Hi receptor cDNA encodes a protein of 491 amino acids (A/ r 
55,954) with seven putative transmembrane domains, illustrat- 
ing the similarity to other receptors that couple with guanine 
nucleotide-bindlng regulatory proteins (G protein-coupled re- 
ceptors). The sequence homology between the Hi and H 2 
receptors is not higher than that between the histamine H t and 
mi -muscarinic receptors. The cloned receptor protein ex- 
pressed in COS-7 cells bound specifically to [ 3 H]mepyramine, 
an H] receptor antagonist, and this binding was displaced by 
Hi receptor antagonists and histamine with affinities compa- 
rable with those in membranes of bovine adrenal medulla. Hj 
receptor mRNA was shown to be expressed in brain and in 
peripheral tissues, including lung, small intestine, and adrenal 
medulla. This investigation discloses the molecular nature of 
the H t receptor — a receptor that mediates diverse neuronal and 
peripheral actions of histamine and that may be of therapeutic 
importance in allergy. 



Since Dale and Laid law (1) first reported the contraction of 
smooth muscle by histamine, the pharmacological signifi- 
cance of this phenomenon has been extensively investigated. 
Three subtypes of histamine receptor (Hi, H2, and H3) are 
known. The Hi receptor was identified by Ash and Schild (2) 
and Hi receptor antagonists have been used in the therapy of 
many allergic diseases, including urticaria, allergic rhinitis, 
pollenosis, and bronchial asthma. In peripheral tissues, the 
histamine Hi receptor mediates the contraction of smooth 
muscles, increase in capillary permeability due to contraction 
of terminal venules, and catecholamine release from adrenal 
medulla (3), as well as mediating neurotransmission in the 
central nervous system (4). Although signal transduction of 
the Hi receptor through Ca 2+ mobilization via an increase in 
the intracellular inositol 1,4,5-trisphosphate level has been 
extensively investigated (5, 6), little is known about the 
molecular structure of the histamine Hx receptor. Recently, 
another method for cDNA cloning of Ca 2+ -mobilizing recep- 
tors through their expression in Xenopus oocytes has been 
developed (7). Meyerhof et aL (8) and Sugama et al (9) have 
reported that the injection of poly(A) + RNA prepared from 
bovine adrenal medulla into Xenopus oocytes resulted in 
functional expression of the histamine Hi receptor in 
oocytes. The present study describes the cloning and se- 
quencing of a cDNA encoding histamine H L receptor^ from 
a cDNA library of bovine adrenal medulla using in vitro RNA 
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transcription and electrophysiological assay with Xenopus 
oocytes. 

MATERIALS AND METHODS 

Materials. [ 3 H]Mepyramine (1073 GBq/mmol) and [a- 32 P]- 
dCTP (=111 TBq/mmol) were purchased from DuPont/ 
NEN). Histamine and (+)-chlorpheniramine were purchased 
from Wako Pure Chemical (Osaka) and Tokyo Kasei (To- 
kyo), respectively. Mepyramine and doxepin were purchased 
from Sigma. (-)-Chlorpheniramine and famotidine were gifts 
from Smith Kline & French and Yamanouchi Pharmaceutical 
(Tokyo), respectively. A mammalian expression vector pEFr 
BOS (10) was donated by S. Nagata of the Osaka Bioscience 
Institute. 

Isolation of Poly(A) + RNA. Total RNA was extracted by the 
acid guanidinium isothiocy an ate/ phenol /chloroform method 
(11). Poly(A) + RNA was isolated by chromatography on 
oligo(dT)-cellulose (12). 

Expression Cloning of Histamine Hi Receptor cDNA. Bo- 
vine adrenal medullary poIy(A) + RNA (« 180 /tg) was size- 
fractionated on a 5-25% (wt/vol) sucrose-density gradient. 
An aliquot (1 p\) of each poly(A) + RNA fraction (20 p\) was 
injected into Xenopus oocytes, and electrophysiological as- 
say by measuring Ca 2+ -dependent inward CI" currents was 
done as described (9). The fraction that showed the highest 
histamine-induced inward CI" currents was used for oli- 
go(dT)-primed cDNA synthesis. Double-stranded cDNAs of 
>2-kilobase (kb) pairs were size-selected by agarose gel 
electrophoresis followed by elution with Geneclean II (Bio 
101, La Jotla, CA) and were ligated into AZAPII (Stratagene) 
at the £coRI site. The library was divided and amplified in 65 
pools of te 20,000 independent clones each. Inpitro transcrip- 
tion was done essentially according to the procedure of Julius 
et at. (13). RNA transcripts (**5 ng) from each pool were 
individually injected into Xenopus oocytes. After incubation 
for 1-2 days, the oocytes were tested for inward CI" currents 
induced by 100 /iM histamine under a voltage -clamp at -60 
mV. The single positive pool of 20,000 clones was progres- 
sively subdivided into smaller pools of 8000, 4000, 400, and 
15 clones until finally a single clone was obtained. cDNA 
encoding the histamine U r receptor was sequenced by the 
M13 chain-termination method (14) using a DNA sequencer 
(model 370A, Applied Biosystems). The sequence homology 
search was done by using dnasis (Hitachi Software Engi- 
neering, Yokohama, Japan). 

Expression of Histamine H t Receptor in COS-7 Cells and Its 
Determination by [ 3 HlMepyramine-Bindlng Assay. An EcoRl 
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fragment (2.7 kb) of the H x receptor cDNA was subcloned 
into the mammalian expression vector pEF-BOS at the BstXl 
site. COS-7 cells were transfected by the DEAE-dextran 
method and were harvested after 60 hr (15). Preparation of 
membranes from COS-7 cells and [ 3 H]mepyramine-binding 
assay were done by a described method (16). Nonspecific 
bindings of [ 3 HJmepyramine to both transfected and non- 
transfected cells at 2.6 nM radioligand were <10% of total 
binding to nontransfected cells. Specific binding of [ 3 H]me- 
pyramine to the nontransfected cells was observed (basal 
control), but that from the transfected cells assayed with 2.6 
nM [ 3 H]mepyramine (3.4 pmol/mg of protein) was -30 times 
the basal control (0.1 pmol/mg of protein). Specific binding 
of [ 3 H]mepyramine to the expressed binding site was calcu- 
lated by subtracting specific f 3 H]mepyramine binding to the 
nontransfected cells from that to the transfected cells. 

RNA Blot Analysis. Poly(A) + RNA prepared from various 
bovine tissues was separated (7 tig per lane) by formalde- 
hyde/1% agarose gel electrophoresis (17) and transferred to 
a nylon membrane (Schleicher & Schuell). A 2.7-kb EcoRX 
fragment of the histamine H! receptor cDNA was labeled 
with [a- 32 P]dCTP by the random-priming method and was 

-107 CCATTGTCCTCTTCAGCGACCTTTirTGCTCTTTCTATTCCTCGCTATTCAATAAGACTGCTCTGAACTCTTGACACTCAGCCGTACCTCGACGCTACACTTGTGCCA 

1 ATG ACC TGT CCC AAC TCC TCC TGC GTC TTC CAA CAC AAG ATG TCT CAG GCG AAT AAG ACT GCC CCT CCC AAC GAT GCC CAG CTG ACG CCC CTG GTG 
X Met Thr Cys Pro Asn Ser Ser Cys Val Phe Glu Asp Lys Met Cys Gin Cly Asn Lyi Thr Ala Pro Ala Asn Asp Ala Gin Lou Thr Pro I . f11 Vfll 



97 CTG CTC CTG AGC ACC ATC TCC TTC CTC ACA GTG GGA CTC AAC CTC CTG GTC CTC TAT CCT GTG CCC ACC CAG CCG AAA CTA CAC ACC CTC GCC AAC 
33 VaI Val Lfii lor Thr lie Ser Lpu Val Thr cly t.pu Asn i*n t.»n V* 1 M»» Tyr Ala Val Arg Ser Glu Arg Lys Leu His Thr Val Cly Asn 
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1441 AAC ACC TTC AAC AAA ATT CTC CAC ATT CCT TCC -AC GACAGACTCCCACGGGATGCAGCCAAGTGACGCTTACTGATGTCCCTGAACCAACTCAACCACGAACCCTCTTCCCTTG 
4B1 Lys Thr Phe Lyi Lys He Leu His He Arg Ser •*• 

1556 CCAWCACCTCCCCCTTCTC^CTCCCAAGAATCCTCTTACCrxCTCGGCACTTTCGAACCTTCCTACCCC^ 

1 683 GAATGTTTCCAACAGACTCACATCTCTCCAACCTCTCCTCTTTC T CCCC AATC TCCC CGCC T CAGGCT CACAC TC TAATTC C AGCT TTC ACAC T C AAA TTA TT CCC C G ACTC AAGGC AC CT CT CCCT 
1810 AGAC TT CC AG T CCACTCT CCACCC TT CT TCAAATGC ACC TCC AC CT CTC TGG ACC AC AC AC CTT AC AC ATCC ACAT AG A TCCT CT CCC TG ACC GG CC TCACCTTC AAACGCACCAC AGCT AC TCCAC 
1937 TGTCACTGCCACATCTCAGAACACCTCTCTTCTGAGCCTCTTTTGCA^ 

2191 CCAAAAAACAG CAC TC CC CACAAGAAACCC AAT T CCTTC AGCCC TG TCCC AGGTTG A TG TCATTT AACC CCCATCACACC CC ACAACACCACAAT CCT CTT AT CAGAAAAC CAAACGACGAAC ACCC 

2318 ACGTCCAACAACGCACCTCAGATCACACAACTACTT ATAACAAAGCTGGAGCAAACATCCT 

2445 CACACACACACACACACACACACACATTCATAATCCCTCACACTCGTCCCACTTCACACCACTATATTCAGACCA 

2572 CTATACTTTTTCATCTCGGAATTCTGCTGTGTTTATCCAACAAACATCATCATGTACTTTTATG 

2699 GTTGCAATCTGCTTGTGATTTATATGCTAAAACTGGATGTTAAACTCTAATACATGTACCCACTCGGAGTGTCTOT 

2826 CACAGATTTTTACCTACTAAAATATGAT 2853 

Fig. 2. Nucleotide and deduced amino acid sequences of the histamine Hi receptor cDN A clone. Sequences of both strands of cDNA were 
determined. Positions of the putative transmembrane segments I— VII of the Hi receptor are indicated below amino acid sequence; the terminal 
of each segment is tentatively assigned from a hydropathy profile. Triangles indicate potential N-glycosylation sites. 
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Fig. 1. (A) Current trace recorded from a Xenopus oocyte 
injected with in vitro synthesized histamine Hi receptor mRNA. {B) 
Mepyramine (10 t*M) was administered 30 sec before histamine 
application. Recordings were obtained at a voltage-clamped mem- 
brane potential of -60 mV. Concentration of histamine applied was 
100 /iM; horizontal bar indicates duration of application. Data were 
reproducible (n - 5), and representative tracings are shown. 
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used as a probe (18). Hybridization was done at 42°C in 5x 
standard saline citrate/20 mM sodium phosphate, pH 7.0/1 x 
Denhardt's solution/50% (vol/vol) formamide/0.1% SDS/ 
10% (wt/vol) dextran sulfate/salmon sperm DNA at 100 
ptg/ml. The membrane was washed with 0. lx standard saline 
citrate and 0.1% SDS at 42°C. 

RESULTS 

Isolation of a Histamine Hj Receptor cDNA. Poly(A) + RN A 
isolated from bovine adrenal medulla was size-fractionated in 
a sucrose-density gradient. Two peaks giving histamine- 
evoked inward currents in oocytes were observed in the size 
range of 2.5- to 3.5-kb nucleotides and above 5-kb nucleotides 
(data not shown). A cDNA library was constructed from 
poly(A) + RNA in the fraction of 2.5- to 3.5-kb nucleotides 
giving the highest response. Of 65 pools tested only one pool 
gave small inward currents in response to 100 yxM histamine. 
After several subdivisions of the positive pool, a single clone 
encoding for a functional histamine Hi receptor was isolated; 
histamine induced inward CI" currents in oocytes injected 
with in W/ro-transcribed mRNA from the cloned histamine 
Hi receptor cDNA (Fig. 1), and mepyramine, an Hi receptor 
antagonist, at 10~ 6 M completely blocked the histamine- 
induced response in oocytes. 

Primary Structure of the Histamine H t Receptor. The 
nucleotide and deduced amino acid sequences of the bovine 
histamine H x receptor are shown in Fig. 2. The clone (2960 
nucleotides long) consisted of 107 nucleotides of the 5' 
untranslated region, 1473 nucleotides of the coding region, 
and 1380 nucleotides of the 3 '-untranslated region. The 
histamine Hi receptor cDN A encodes a protein of 491 amino 
acids with a M x of 55,954. 

Pharmacological Characterization of [ 3 H]Mepyramine- 
Binding to the Histamine Hi Receptor Expressed in COS-7 
Cells. For determination of pharmacological characters of the 
receptor, the EcoRl fragment (2.7 kb) of the H x receptor 
cDNA was subcloned into the mammalian expression vector 
pEF-BOS, and the vector was introduced into monkey kid- 
ney COS-7 cells. After 60-hr incubation, the binding of 
[ 3 H]mepyramine to the membranes from the cells was mea- 
sured. Specific binding of [ 3 H]mepyramine to the expressed 
binding site was saturable, and Scatchard plot analysis indi- 
cated the presence of a single binding site with a K 6 value of 
3.2 nM and a Br^ x value of 6.6 pmol/mg of protein (Fig. 3 A). 
K\ values of mepyramine, and (+)- and (-^chlorphenir- 



amines were determined to be 2.6 x 10~ 9 M, 8.0 x 10~ 9 M, 
and 7.6 x 10" 7 M, respectively (Fig. 3fl). These K d and K\ 
values and the stereoselectivity of (+)- and (-Chlorphen- 
iramines for the binding site expressed in COS-7 cells were 
comparable with those for adrenal medullary membranes. 
The K d value was 1.5 x 10" 9 M; K t values were 1.8 x 10 -9 
M (mepyramine), 4.3 x 10~ 9 M [^-chlorpheniramine], and 
4.6 x 10" 7 M [(-)-chlorpheniramine], as described (19). 

Tissue Distribution of Histamine Hi Receptor mRNA. Tis- 
sue distribution of receptor mRNA was determined by RNA 
blot analysis (Fig. 4). A band of 3.0-kb nucleotides corre- 
sponding to a histamine Hi receptor mRNA was detected in 
various bovine tissues. The level of Hi receptor mRNA was 
high in the lung and small intestine, moderate in the adrenal 
medulla and uterus, and lower in the cerebral cortex and 
spleen. No Hi receptor mRNA was detectable in the cardiac 
atrium or liver. 

DISCUSSION 

In the present study, we isolated and sequenced a cDNA 
clone for the bovine histamine Hi receptor by using an oocyte 
expression system and also examined the pharmacological 
properties of this receptor and the tissue distribution of its 
mRNA. 

The cloned cDNA had no poly(A) + , but its size [2960 base 
pairs (bp)] was comparable with that of histamine Hi receptor 
mRNA determined by RNA blot analysis. The M r of encoded 
Hi receptor (55,954) was also consistent with the values 
estimated by photoaffinity labeling of bovine adrenal medulla 
(A/ r 53,000-58,000) (19) and in guinea pig tissues (M r 56,000- 
57,000) (20). Hydropathy-profile analysis (21) of the hista- 
mine Hi receptor revealed the existence of seven putative 
transmembrane domains, indicating a similar topology to 
those proposed for other G protein-coupled receptors. The 
histamine Hi receptor also possesses a characteristic large 
third cytoplasmic loop and short carboxyl terminus (22), as 
do the mi-muscarinic (23) and dopamine-D 2 (24) receptors. 
We observed another ATG codon 39 bp downstream from the 
presumed initiation codon. Comparison with Kozak consen- 
sus sequence (25) indicated that neither of the two ATG 
codons had any advantage as an initiation codon. However, 
as receptors for biogenic amines and acetylcholine possess 
conservative aspartate residues at position 108 as putative 
binding sites for their monoamine and tertiary -amine residues 
(26), we presume that the upstream ATG codon is the 
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Fig. 3. Binding of [ 3 H]mcpyramine to transfcctcd COS-7 cell membranes. (A) Saturation isotherm of specific binding of [ 3 H]mepyramine 
to membranes from COS-7 cells transfected wiih the receptor cDNA (o). {Inset) Scatchard plot of this data. B/F, bound/free, (fi) Inhibition 
of [ 3 H]mepyramine-binding to transfected COS-7 cell membranes by various drugs. Membranes were incubated with 4 nM [ 3 H]mepyramine and 
various concentrations of doxepin (a), mepyramine (o), (+)-chlorpheniramine (o), (^chlorpheniramine (■), famotidine U), or histamine (•). 
Data points are means of triplicate experiments. 
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Fig. 4. RNA blot analysis of mRNA isolated from various bovine 
tissues. Lanes contain 7-/xg samples of poly(A) + RNA from cerebral 
cortex (lane 1), lung (lane 2), liver (lane 3), cardiac atrium (lane 4), 
small intestine (lane 5), adrenal medulla (lane 6), spleen (lane 7) t and 
uterus (lane 8). Arrow indicates Hi receptor mRNA. 

initiation codon because it would give a histamine H! recep- 
tor with the conservative aspartate residue at position 108. 
The histamine H! receptor is highly similar to other G 
protein-coupled receptors. The sequence of the histamine Hi 
receptor is compared with those of some other G protein- 
coupled receptors in Fig. 5. Sequence homology of trans- 
membrane domains between Hi and H 2 receptors (40.7%) 
(27) is not higher than that between Hi and nvmuscarinic 
receptors (44.3%) (23). 

There are two potential N-glycosylation sites (Asn-5, Asn- 
18) in the ami no- terminal region with a consensus sequence 
Asn-Xaa-Ser/Thr (Fig. 2) (29). Mitsuhashi and Payan (30) 
reported regulation of the affinity of the histamine Hi recep- 
tor by its glycosylation. An additional N-glycosylation site 
(Asn-187) was observed in the second extracellular loop of 
the cloned receptor. 

The third cytoplasmic loop of the histamine Hi receptor, 
-which, by analogy, is thought to interact with a G protein, has 



many serine and threonine residues that may serve as sites for 
phosphorylation by protein kinases (Fig. 2). Signal transduc- 
tion through the histamine Hi receptor is depressed by 
activation of protein kinase C in various cells (31-33). Thus, 
the potential sites of phosphorylation in the third cytoplasmic 
loop may play an important role in regulating signal trans- 
duction through the receptor molecule. 

Amino acid residues that are conserved in G protein- 
coupled receptors were also seen in the Hi receptor: (0 Two 
cysteines (Cys-101 and Cys-181) that have been proposed to 
form a disulfide bond appear in the first and the second 
extracellular loops (34). (//) An aspartate residue (Asp-74) is 
present in the second transmembrane domain. (Hi) An anionic 
and cationic amino acid pair (Asp- 125 and Arg-126) occurs at 
the cytoplasmic border of the third transmembrane domain, 
(iv) A conservative sequence of 10 amino acids (Leu-460- 
Pro-469) is observed in the seventh transmembrane domain! 

The Hi receptor mRNA was visualized by RNA blot 
analysis in various bovine tissues in which the existence of Hi 
receptors was reported (3). The presence of the Hi receptor 
mRNA in bovine uterus was clearly demonstrated, whereas 
only H 2 receptors (35) and both Hi and H 2 receptors (36) were 
reported present in the uterus from pharmacological studies. 
The band of Hi receptor mRNA from brain was unexpectedly 
faint (Fig. 4); this observation was surprising because the 
[ 3 H]mepyramine-binding capacities of brain membranes from 
various species are reported comparable to those of mem- 
branes from peripheral tissues (6). Doxepin is a potent 
displacer of pHJmepyramine bound to the histamine Hi 
receptor from bovine adrenal medulla (Fig. 3). A doxepin- 
insensitive subtype of histamine Hi receptor has been pro- 
posed to be present in brain because the binding capacity of 
[ 3 H]doxepin to rat brain membranes is ~10% that of [ 3 H]me- 
pyramine (37). 

Cardiac atrium and liver did not give detectable bands of Hj 
receptor mRNA (Fig. 4). Pharmacological studies indicate 
the presence of Hi receptors in heart (3). However, biochem- 
ical results (20) show that the M r of the histamine Hi receptor 
in guinea pig heart is 68,000, which is larger than the sizes (M T 
56,000-57,000) of these receptors in lung, intestine, and 
cerebellum, suggesting a subtype of Hi receptors in heart in 
which the H x receptor mRNA does not hybridize with the 
cloned cDNA. A relatively large amount of [ 3 HJmepyramine- 
binding protein is present in liver and was recently suggested 



HI 
H2 
Ml 
al 

5HT-1C 
D2 . 



HI 
B2 
Ml 
al 

SHT-lc 
D2 



--WLSTISL\OVGIJJLLVLYAVR£ERKIJJTVGNLYIV^ 
"VVLTVLILITIAGBVWCIAVGIJiRRXR^LTNCFIVSIA^ 

— STTGLLSLATVTGBLLVLI S IKVNTEiKTVNNYFLLSLACADLI I<rrFSKNLYTTYL-I*G-HWAiGTIJU»LWIJUi>YV^ 
--VI«KLILFGVIX»IMrii*VACHRHlHSVTHYY 

W 1 1 IMTIGGfl I LVIMAVSMEiaajWATNYFI^lAIM)MLVCIXWPLSIXAI -LYDYVWP 
--KLLTUaFIIVFG^LVCMAVSREKAI^TTTNYLIV^ 



UIYRTKTRASITII«WF«F-1*I-IPI-L»RHFQFKT^ a , a ., . 

PVLITPVRVAVSLVLIWVIBITLSF-LSIHIXnOJSWIETSSFNHT^ a . a . , . 

RAiUtTPRAAAUlIGLAWLVSFVI^A-PAI-LFW-Q^ a . a<) _ 
PTIVTQ^IJ1AU£VWA1£LVISI-CPL-FGWR--QP-^ *.«,)- 

SRFWSRTKAIKJtlAIVWAISICVSVPIpV-ICLRD-ESKV— FVNNTTC VLNDPNFVL IGSFVAFTIPLTIMVITYFLT I YVLRRQTLKLLRGHTEEE- ( 4 9 a.a.) - 

TRYSSKRHVTVMIAIVHVLSF-TIS-CPL-L FGLHNT-D— QNEC 1 IAHPAFWYSSIVSFYVP-FIVTLLVYIKI YIVLRKRRKRVNTKRSSR- ( 107 a.a.)- 



412 



5HT-1C 
D2 



-VI- 



NRZRXAAKQLGFIHAAri ICttlFYFI FFMVIA-F-CESCCNQ- 
ICtHKATVTXAAVHCAr I ICVFPYFTVFVYRG-LKGDDA I NE- 
VKEKKAARTLSAILLAriLTWTPYHIMVLVST-F-CKDCVPE- 
SRKKKAAKTLG I WGCFVLCVLPFF LVKP I CSFF - PDF RP SE - 



-HVHKFTIWLGTIBSTLKPLlYPLaJEKFiaiTFKKILHlRS 
- AFEAWLVLGYAKSALHP I LYAT LHRD FRTA YQQLFRCRP— 
-TWEIXYWLCTVKSTVOTHCTASCNKAFRDHFRLLLLCRW-- 
-TVFK I AFWLGTLHSC I HP 1 1 YP CS SQEFKKA7QHVLR I QC 



NKXKKA5KVLGI VFFVFLI KWCPFF I TM I LSV- L -CCKACKQKLKEKLLKVF - VW I GYVCSG IHP LVYTLFHKIYRRAFSKYLRCDY— 
QK1 KXATQKLA I VLGVF 1 1 CWLPFFITH I IM I -H-CO — CNI P P VLYSAFTWLCTVHSAVHP 1 1 YTTFH I EFRKAFMK I LHC 



Fig: 5. Alignment of amino acid sequences of bovine histamine Hi receptor (HI) and some representative G protein-coupled receptors. H2, 
canine histamine H 2 receptor (27); Ml, mouse nu-muscarinic receptor (23); al. bovine aic-adrenergic receptor (28); 5HT-lc, rat serotonin lc 
receptor (13); and D2, rat dopamine Dj receptor (24). Amino acid residues shown by boldfaced type in sequences are identical; residues 
nonhomologous with Hi receptor sequence in the loop between transmembrane segments V-VI are summed in parentheses. Positions of putative 
transmembrane segments I-VII of Hi receptor are indicated. 
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to be a member of the family of debrisoquine-type cyto- 
chrome P450s (38). 

The receptor cDNA clone for the classical histamine 
receptor (3) f the Hi receptor, isolated in this study, will be 
useful for molecular studies of function and regulation of 
activities mediated through the Hi receptor molecule and for 
molecular analysis of possible Hi receptor subclasses. In situ 
and immunocytochemical studies on localization of the Hi 
receptor will also be helpful in analyzing physiological func- 
tions of histamine in the central nervous system and in 
peripheral tissues. 
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ABSTRACT The H2 subclass of histamine receptors me- 
diates gastric acid secretion! and antagonists for this receptor 
have proven to be effective therapy for acid peptic disorders of 
the gastrointestinal tract. The physiological action of histamine 
has been shown to be mediated via a guanine nucleotide- 
binding protein linked to adenylate cyclase activation and 
cellular cAMP generation. We capitalized on the technique of 
polymerase chain reaction, using degenerate oligonucleotide 
primers based on the known homology between cellular recep- 
tors linked to guanine nudeotide-binding proteins to obtain a 
partial-length clone from canine gastric parietal cell cDNA. 
This clone was used to obtain a full-length receptor gene from 
a canine genomic library. Histamine increased in a dose- 
dependent manner cellular cAMP content in L cells perma- 
nently transfected with this gene, and preincubation of the cells 
with the H2-seIective antagonist cimetidine shifted the dose- 
response curve to the right. Cimetidine inhibited the binding of 
the radiolabeled H2 receptor-selective ligand [m£//ty/- 3 H]tio- 
tidine to the transfected cells in a dose-dependent fashion, but 
the HI -selective antagonist diphenhydramine did not. These 
data indicate that we have cloned a gene that encodes the H2 
subclass of histamine receptors. 



Histamine is one of the major determinants of gastric acid 
secretion. On the gastric parietal cell, histamine exerts its 
stimulating action through an H2 subclass of receptor cou- 
pled via a guanine nucleotide-binding protein (G protein) to 
activation of adenylate cyclase and production of cAMP. 
Antagonism of histamine's action at this receptor has been 
the cornerstone of an immense market for pharmacological 
treatment of acid-peptic disorders of the gastrointestinal 
tract. Through its three known receptor subclasses (HI, H2, 
and H3), histamine has been shown to exert a broad array of 
other physiological actions as well, including mediation of 
allergic and anaphylactic responses, modulation of cardiac 
contractility and systemic blood pressure, and mediation of 
neural function in the central nervous system (1-4). Despite 
this wealth of pharmacological information, little is known 
about the structure of the histamine receptor. The present 
studies describing the cloning and sequencing II of a gene 
encoding a protein with the functional characteristics of an 
H2 subclass of histamine receptors provide insight into the 
molecular biology of histamine action. 

In recent years the genes for a family of G protein-linked 
receptors have been cloned, and analysis of the deduced 
structures of their proteins has indicated that they have a 
motif of seven transmembrane regions. Capitalizing on the 
similarities of the amino acids comprising the transmembrane 
regions, Libert et al. have devised a strategy to clone other 
members of this family (5). By using synthetic oligonucleo- 
tides complimentary to the DNA encoding the transmem- 
brane regions of known G protein-linked receptors as primers 
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for the polymerase chain reaction (PCR), they were able to 
generate partial cDNA sequences encoding proteins having 
the common transmembrane motif. We utilized this strategy 
to clone the histamine H2 receptor gene, using cDNA from 
canine gastric parietal cell mRNA as a template. 

MATERIALS AND METHODS 

Isolation of Parietal Cell mRNA. Cells from freshly obtained 
canine fundic mucosa were dispersed by sequential exposure 
to crude collagenase at 0.25 /xg/ml and 1 mM EDTA, and a 
fraction enriched in parietal cells (70%) was isolated by 
counterflow elutriation by the method of Soil (6). RNA was 
extracted by the acid guanidinium isothiocyanate-phenol- 
chloroform method (7), and poly(A) + RNA was obtained by 
oligo(dT)-cellulose chromatography. The poly(A) + RNA 
served as a template for cDNA synthesis using the avian 
myeloblastosis virus reverse transcriptase (Seikagaku Amer- 
ica, Rockville, MD). The cDNA thus obtained functioned as 
a template for the PCR with the oligonucleotide primers 
described below. 

PCR. Oligonucleotides corresponding to the third and sixth 
transmembrane domains of G protein-linked receptors were 
duplicated from the design of Libert et al. (5) with the 
exception that our primers lacked the linker sequences. The 
primers were synthesized by using an Applied Biosystems 
380B DNA synthesizer. The conditions for the PCR were as 
follows: denaturation for 1.5 min at 94°C, annealing for 2 min 
at 45°C, and extension for 4 min at 72°C. The reaction was 
carried out for 30 cycles, and then 20% of the product was 
added to fresh buffer and submitted to another 30 cycles. The 
final reaction products were extracted with phenol/chloro- 
form, 1:1 (vol/vol), and then precipitated with ethanol. DNA 
polymerase I Klenow fragment was used to form blunt-ended 
DNA, and the products of this reaction were electrophoresed 
on a 2% NuSieve/1% Seaplaque gel (FMC). Of the two major 
bands that were produced, the one of =400 base pairs (bp) 
was cut from the gel and subcloned directly into the phage 
M13 sequencing vector (8). Dideoxynucleotide sequencing 
was then performed by the chain-termination method of 
Sanger (9) with Sequenase version 2 (United States Biochem- 
ical). 

Genomic Cloning. The partial-length PCR-derived clone 
was random-primed (10) with 32 P and used as a probe to 
screen a canine genomic library (Clontech). Under high- 
stringency hybridization [0.9 M sodium chloride/0.09 M 
sodium citrate (6x SSC) at 65°C] and wash conditions (O.lx 
SSC at 55°C), a single clone exhibited a positive hybridization 
signal with the probe. Restriction enzyme mapping of the 
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DNA insert in this clone revealed an Xba l-Sau I fragment 
that contained the partial-length PCR-derived clone, which 
was inserted into the M13 vector and sequenced. 

Expression Experiments. The presumed full-length coding 
region of the receptor was subcloned into CMVneo, a 
PUC13-based vector that also contains the lac UV5-S V40 
(simian virus 40) promoter (440 bp), Tn5-neo (1400 bp) SV40 
splice site and polyadenylylation signal (320 bp), cytomega- 
lovirus (CMV) promoter (700 bp), and human growth hor- 
mone polyadenylylation signal (700 bp) (11). L cells were 
transfected by the technique of calcium phosphate coprecip- 
itation (12). Permanently transfected L cells were selected by 
adding the neomycin analogue G418 to the culture medium at 
600 /tg/Iiter. The expression of the receptor gene in the 
selected clones was examined by RNA blot hybridization 
(Northern) analysis (see below) coupled with functional 
assays as follows. The cells were incubated in Earle's bal- 
anced salt solution with varying concentrations of histamine 
for 60 min at 37°C after a 60-min preincubation in medium 
with or without 100 /iM cimetidine. Ice-cold 30% trichloro- 
acetic acid was added to stop the reaction and precipitate the 
cellular protein. After centrifugation for 10 min at 1900 x g, 
the supernatant was extracted with ether, lyophilized, and 




Fig. 1. Gel electrophoresis of PCR products from a gastric 
parietal cell cDNA template. 

resuspended in 50 mM Tris/2 mM EDTA, pH 7.5. The 
content of cAMP was measured by a competitive protein- 
binding assay using an Amersham kit. For binding studies, 
transfected L cells were plated and grown to confluence in 2.4 
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1 ATCATATCTAACCCCACAG U, 111 1 U- U 1 1 GTCTG£ACTCTCCTCCATGCAG<^TCACTCTCAGCGTGGTCCT CACTCTCCTCATCCTC 

1 MISNGTGSSFCLDSPPCRITVSVVLTVLIL 

I I I I I I I I I 

91 ATCACCATCCCCCCCJUlTCTCCTGCTCTCOCTGCCTGTCa^TCAACCOCCCGCTC 

31 ITIAGNVVVCLAVGLNRRLRSLTNCFIVSL 

I I I I I I I I I 

181 CCT ATCACCGATCTGCTCCTCGGCCTCCTCGTCCTCCOCTTCrCCGCCTTCTAC^^ 
61 AITDLLLGLLVLPFSAFYQLSCRWSFGKVF 

I I I I I I I I I 

271 TGCAATATCTATACCACCTTGGATGTGATGCTCTGCACCCCCTCCATCCTCAACCTCTTCATCATCACCCTTC 
91 CHIYTSLDVMLCTASI LNLFMI SLDRYCAV 

I I I I I I I I I 

361 ACTGACCCCCTGCGCTACCCTCTGCTTAICACCCCACTCCGGG 

121 TDPLRYPVLITFVRVAVSLVLIWVIS1TLS 

I 1 I I I I t I f 

451 TTCCTGTCTATTCATCTGGGGTCGAACAGCAGGJUITGAGACCAGCAGTTTCAATCA 

151 FLSIHLCWHSRHETSSFHHTIPKCKVQVUL 

I I I I I I I I I 

541 GTCTATGGCTTGGTCCATCCGCTCGTCACCTTCTACCTCCCCCT^ 

161 VYGLVDGLVTFYLPLLVMCITYYRIFKIAR 



I 



I I I 

ATGGGCTCCTCCAACGCAGCTACCAT 



I 

631 AGGATCCATG 

211 DQAKRIHHMGSWKAATIGEHXATVTLAAVM 



till 
AGCCACAGTGACACTGGCTGCAGTGATG 



I I I I I I I I 

ATX#CCAACTOGGCCCrGAACOCTATOCTCTATCOCACACTCAACAGAGACTTCCG 
271 AV VLWLGYAHSALUP I LYATLHRDFRTAYQ 

I I I I I I I I I 

901 CAGCTCTTCCCCTCCA G GCCttGCCaUXXACAATCCCCACCAAA ICTCT CA CCTOCAACACCTCTCASCTCGCCAGGA AT C AAA fyVfl A 
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1171 AGCTACTTCAACaTTCTGTCCTCGGAAGTTTTCATGAGCACTTT^^ 



Fig. 2. The nucleotide and deduced amino acid sequence (in single-letter code) of the canine histamine H2 receptor gene. 
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Fig. 3. Expression of the canine histamine H2 receptor gene in 
various tissues. {Left) Northern blot showing the hybridization of 10 
/ig of poly(A) + RNA extracted from each of the designated tissues 
with the 32 P- labeled gene. (Right) Comparison of the expression of 
the receptor gene in a fraction of fundic mucosal cells consisting of 
roughly 70% parietal cells and a fraction consisting of the nearly 
100% chief cells, the primary contaminant in the parietal cell- 
enriched fraction. 

x 1.7 cm multiwell plates. The culture medium was removed, 
and cells were washed twice with Earle's balanced salt 
solution containing 0.1% bovine serum albumin. An aliquot 
(36 nCi; 1 Ci = 37 GBq) of [mef/iy/- 3 H]tiotidine (87 Ci/mmol; 
DuPont) was added to the culture in the presence of either 
cimetidine or diphenhydramine; after 1 hr of incubation, the 
medium was removed by aspiration. After, the cells were 
washed twice with phosphate buffered saline (PBS), pH 7.4, 
and lysed with 1% Triton X-100, the radioactivity was 
quantified. Maximum binding was determined by incubation 
of [/ner/iy/- 3 H]tiotidine with transformed L cells in the ab- 
sence of antagonists. Nonspecific binding, which was sub- 
tracted from total binding to obtain specific binding, was 
determined as the amount of label remaining bound in the 
presence of 100 /xM histamine. 

Northern Blots. The expression of the cloned gene was 
examined in various tissues by Northern blot analysis. For 
these studies, poly(A) + RNA was extracted as described 
above, separated on a 1.25% formaldehyde-agarose gel, and 
blotted to nitrocellulose. Hybridization was performed under 
conditions as described (13) with the presumed coding region 
of the receptor gene that had been labeled with 32 P by random 
priming (10). The final washing of the blot was in 0.1 x SSC 
at 65°C. 
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Fig. 4. Response to exogenously administered histamine of L 
cells transfected with a CMVneo vector containing the canine 
histamine H2 receptor gene insert. The data represent means ± SEM 
from four experiments. Response was shifted by addition of 0.1 mM 
cimetidine, an H2 receptor-selective antagonist. 
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Fig. 5. Inhibition of [me/Ay/- 3 H]tiotidine binding to transfected L 
cells by diphenhydramine and cimetidine. The data are from a single 
experiment and are virtually identical to the data obtained in two 
other experiments. 

RESULTS 

An ethidium bromide-stained gel of the products of PCR is 
depicted in Fig. 1. As noted above, two major bands of ~400 
bp and «350 bp were produced, and the former band was cut 
from the gel and cloned into phage M13. Of 12 clones 
obtained, only 1 had the nucleotide and deduced amino acid 
sequence expected of a G protein-linked seven-transmem- 
brane receptor. Computer analysis of the amino acid se- 
quence of this single clone revealed extensive homology to 
other known G protein-linked receptors, and Kyte-Doolittle 
analysis confirmed the presence of the two hydropathic 
putative transmembrane domains between the third and sixth 
transmembrane sequences upon which the primers were 
based (14). Screening a canine genomic DNA library resulted 
in one clone with a positive hybridization signal. The nucle- 
otide and deduced amino acid sequence of the presumed 
coding region of this gene is depicted in Fig. 2. Northern blot 
analysis showed that the gene was expressed most abun- 
dantly in the gastric fundus and, to a lesser extent, in the brain 
(see Fig. 3). Further analysis revealed that parietal cells were 
most likely to be the origin of the positive hybridization signal 
obtained with gastric poly(A) + RNA. 

The L cells transfected with the H2 receptor construct 
showed dose-dependent increases in cellular cAMP content 
in response to histamine stimulation (Fig. 4), reaching a 
maximum response of 217 ± 10% over basal (mean ± SEM; 
n = 3) after the 10 histamine dose. The dose-response 
curve could be shifted to the right by the H2 receptor- 
selective antagonist cimetidine. Serotonin, epinephrine, do- 
pamine, and carbamoy Icholine in doses as high as 100 /xM 
had no effect on cAMP content. Nontransfected L cells, L 
cells transfected with a CMVneo vector missing the receptor 
gene construct insert, and L cells transfected with a CMVneo 
vector containing as an insert a gene encoding the a catalytic 
subunit of the cAMP-dependent protein kinase all failed to 
demonstrate any response to histamine. Cimetidine displaced 
binding of [/nef/ty/- 3 H]tiotidine to transfected cells in a dose- 
dependent fashion with an ED50 of 5.5 ± 0.6 x 10" 7 M (mean 
± SEM; n = 4) (Fig. 5). In contrast, diphenhydramine, a 
relatively selective HI receptor antagonist, demonstrated no 
ability to inhibit [mef/ry/- 3 H]tiotidine except at the highest 
dose. 

DISCUSSION 

We utilized the PCR to clone a gene encoding a protein with 
the functional properties of a histamine H2 receptor. Al- 
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CANH2 - ET SLRSHSSQLA-RNQSREPHRQEEKPLK-LQVWSGTEVTAPRGATDR 

HAMADRB2 - YGNGY SSNSNGKTDYKGEASGCQLC&-EKESERLCEDPPGTESFVNCQGTVPSLSLDSQGRNCSTNDSPL 

HUMADB3 - CAAARPALFPSGVPAARSSPAQPRLCQRLDG 

BOVSUBK - EDKMELTYTPSLSTRVNRCHTKEIFFMSGDVAPSEAVNGQAESPQAGVSTEP 

Fig 6 Structural comparison of the putative histamine H2 receptor with other G protein-linked receptors. The deduced amino acid 
sequences of the receptors (indicated by the conventional single-letter abbreviations) are aligned on the basis of homologous regions, ^whtch are 
shown by boldface letters. The roman numerals indicate the putative transmembrane domains. CANH2, canine H2 receptor: HAMADB2, 
hamster ft-adrenergic receptor (15); HUMADB3, human ^-adrenergic receptor (16); BOVSUBK, bovine substance K receptor (17); 
HUMACHRM2, human M 2 -muscarinic receptor (18); RATD0P2, rat dopamine D2 receptor (19). 

though the approach that we utilized to obtain this clone was 
nonspecific, we purposely targeted a particular tissue known 
to contain certain G protein-linked receptors of interest, 
including those for histamine and gastrin. The full-length 
clone obtained was initially for a receptor specific for an 
unknown Ugand; however, comparison of the deduced amino 
acid sequence to that of other G protein-linked receptors with 
presumed seven-transmembrane motifs revealed extensive 
homology (Fig. 6). Like the genes encoding many of the other 
members of this family, our gene appeared to be devoid of 
introns as well (20). Several features of the amino acid 
sequence deduced from our gene were notable and provided 
clues as to its identity. The first clue was the aspartic acid 
residue in the third transmembrane domain. An aspartic acid 
in this position has been shown by mutational analysis to be 
important for ligand binding to the ^-adrenergic receptor, 
which is also a member of this receptor family (Fig. 7A). It is 
hypothesized that the carboxyl group of the aspartic acid 
moiety acts as a counter anion to the cationic amino group of 
^-adrenergic agonists (21). Indeed, receptors for a number of 
cationic biogenic agonists such as dopamine and acetylcho- 
line are also characterized by the presence of this aspartic 
acid residue, while receptors for other ligands such as peptide 
hormones are not. The second structural feature of note was 
the absence of the two serine residues present in the fifth 
transmembrane region of receptors for catecholamines and 
dopamine as highlighted in Fig. IB. This information sug- 



A CANH2 
Hl*RDA2 
HUMADB1 
HAMADRB2 
HUODB3 
RAIDCP2 

RATSUBP 
BCWSUBK 



B CANH2 
HU-ftDBl 
HAMADRB2 
HUMADB3 
HAMH2RA1 
BATDCP2 
BOVSUBK 
RATSUBP 



■ CNIYT-SLD-VMLC-TA SIINUMISLDRY 

■ CGVYL-ALD-VLFC-TS SIVHICAISLDRY 

• CELHT-SVD-V-LCVTA SIITLCVIALDRY 

■ CEFWT-SID-V-IjCVTA SIETLCVIAVDRY 

■ OIWr-SVD-V-LCVTA SIETLCALAVDRy 

■ CDIFVT-LD-VMC-TA SHNI£AISIDRY 

■ CDIMA-IJDYVVSN — A SVM^LLIISFDRY 

- CKFHNFFPIAALF A SIYSMEAVAET3PY 

■ CYFCNIfPI TAVFVSIYSMmiAAORY 

■ YTIVTLS VTFLPGYNTGL — IiTAISVOC 



• NLVYO^VDGIA/TFYLP LLWCTTY 

■ NRAYM-ASSVVSFYVP IOMAFVY 

■ NSaAI-ASSIVSFYVP LWWFVY 

■ R«YVL-LSSSVSFYLP LIAMfVY 

■ EPFYAIr-FSSLGSFYIPIAV-IIAMrc 

- h^AFW-YSSIVSFyWFTVTLLVYIKIY 

• LliYHLJVTALIYF-LP LWM-FVA 

• EKAYHICVTVLIYF-LP UW — IGY 

• NAAVTFGIAIAA-^YLP VUMT-VL 

■ DOW/I IFIAILSF-LVriPLMLVSST IL 



Fig. 7. Structural comparisons of the third 04) and fifth (B) 
transmembrane domains of the canine H2 receptor (CANH2) with 
those of other G protein-linked receptors: HUMADA2 (26), H LI- 
MAD Bl (27), and HUMADB3, human a r , 0 r and ^-adrenergic 
receptors; HAMADRA1 and HAMADRB2, hamster a v and 0 2 - 
adrenergic receptors; HUMACHRM2, human M 2 -muscarinic recep- 
tor; RATDOP2, rat dopamine D 2 receptor; RATSUBP (28. 29), rat 
substance P receptor; BOVSUBK, bovine substance K receptor; 
MAS, product of mas oncogene (30). 
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gested that our clone encoded a novel class of receptor. 
However, the conservative substitution of a threonine resi- 
due and an aspartic residue for the two serine residues was 
of particular interest in view of the data suggesting that the 
serines are sites of hydrogen bonding to the hydroxyl groups 
present in the catechol ring of adrenergic agonists (22). A 
third structural feature of interest (Fig. 6) was the homology 
of the carboxyl- and amino-terminal ends of the third cyto- 
plasmic loop (between the fifth and sixth transmembrane 
regions) with comparable regions of the ^-adrenergic recep- 
tor, which have been shown previously to be of critical 
importance to its linkage to the G protein associated with 
adenylate cyclase activation (22, 23). 

This structural information suggested the possibility that 
our clone encoded a receptor for a positively charged bio- 
genic amine linked to adenylate cyclase activation. We 
hypothesized that the most likely such receptor on gastric 
parietal cells would be the H2 subtype of histamine receptor. 
This hypothesis was tested and proven by inserting the 
presumed coding region of the receptor gene into the eukary- 
otic expression vector CMVneo, expressing it in mouse L 
cells, and measuring the changes in cellular cAMP content 
induced by histamine. We characterized further the nature of 
the histamine receptor subtype encoded in our cloned gene 
by demonstrating the specific binding 'of [methyl-^U]- 
tiotidine, a labeled H2-receptor antagonist, to L cells trans- 
formed with the receptor gene. Our data confirmed that our 
clone encoded the H2 subtype of histamine receptor. 

An interesting feature of our cloned gene is the presence of 
an out-of-frame ATG codon 50 bp upstream of the presumed 
initiation codon of the major open reading frame (Fig. 2). A 
similar short open reading frame upstream of the major open 
reading frame has been described previously for the ^-ad- 
renergic receptor, although its significance is yet unknown 
(15, 24). The translation initiation sequence of the major open 
reading frame is more consistent with the consensus eukary- 
otic translation initiation sequence (25). The transcription 
initiation site of our receptor gene has not been determined; 
however, we examined two different receptor gene con- 
structs in L cells, one containing the entire gene sequence as 
described in Fig. 2 and the other lacking the short upstream 
open reading frame. Expression of both of these constructs 
resulted in L cells that exhibited histamine binding and cAMP 
generation in response to histamine (data not shown). While 
we did not compare levels of expression, the upstream 
segment is apparently not essential for histamine receptor 
gene expression. 

As mentioned above, a major difference in the structural 
features of the H2 receptor and that of catecholamine recep- 
tors is the absence of the two serine residues in the fifth 
transmembrane domain. However, with the knowledge that 
the natural ligand for the former receptor is an imidazole, it 
is possible to speculate on the nature of the ligand-receptor 
interaction. The aspartic and threonine residues that have 
substituted for the serine moieties have the ability to interact 
via hydrogen bonds with the nitrogen moieties on the imida- 
zole ring of histamine. Future mutational analysis of this site 
will be required to substantiate the validity of this model. 
Nonetheless, through modeling and analysis it may be pos- 
sible to define the nature of histamine binding and, perhaps 
more importantly from a therapeutic standpoint, inhibition of 
histamine binding. 

By taking advantage of the marked homology between 
receptors linked to G proteins, we have been successful in 
cloning a gene encoding the H2 subtype of histamine recep- 
tors despite starting without even rudimentary knowledge of 
the biochemistry of this receptor. If there were substantial 
homology among the histamine receptor subtypes as there is, 
for example, among the catecholamine receptor subtypes, it 
might be possible to extend these findings on the H2 receptor 



ultimately to structural information on the HI and H3 recep- 
tors through cloning of their genes as well. 
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ABSTRACT 

Histamine regulates neurotransmitter release in the central and 
peripheral nervous systems through H 3 presynaptic receptors. 
The existence of the histamine H 3 receptor was demonstrated 
pharmacologically 15 years ago, yet despite intensive efforts, 
its molecular identity has remained elusive. As part of a directed 
effort to discover novel G protein-coupled receptors through 
homology searching of expressed sequence tag databases, we 
identified a partial clone (GPCR97) that had significant homol- 
ogy to biogenic amine receptors. The GPCR97 clone was used 
to probe a human thalamus library, which resulted in the iso- 
lation of a full-length clone encoding a putative G protein- 
coupled receptor. Homology analysis showed the highest sim- 
ilarity to M2 muscarinic acetylcholine receptors and overall low 
homology to all other biogenic amine receptors. Transfection of 



GPCR97 into a variety of cell lines conferred an ability to inhibit 
forskolin-stimulated cAMP formation in response to histamine, 
but not to acetylcholine or any other biogenic amine. Subse- 
quent analysis revealed a pharmacological profile practically 
indistinguishable from that for the histamine H 3 receptor. In situ 
hybridization in rat brain revealed high levels of mRNA in all 
neuronal systems (such as the cerebral cortex, the thalamus, 
and the caudate nucleus) previously associated with H 3 recep- 
tor function. Its widespread and abundant neuronal expression 
in the brain highlights the significance of histamine as a general 
neurotransmitter modulator. The availability of the human H 3 
receptor cDNA should greatly aid in the development of chem- 
ical and biological reagents, allowing a greater appreciation of 
the role of histamine in brain function. 



Since its first pharmacological description as an endoge- 
nous substance in 1910 (Barger and Dale, 1910), histamine 
has proven to exert tremendous influence over a variety of 
physiological processes. Most notable are its roles in the 
inflammatory "triple response" and in gastric acid secretion, 
which are mediated by H x (Ash and Schild, 1966) and H 2 
(Black et al., 1972) receptors, respectively. In the early 1970s 
emerged an understanding that histamine is a neurotrans- 
mitter in the central nervous system (Schwartz et al., 1970; 
Baudry et al., 1975). In 1983, a third subtype of histamine 
receptor, H 3 , was identified as a presynaptic autoreceptor on 
histamine neurons in the brain controlling the stimulated 
release of histamine (Arrang et al., 1983). Subsequently, the 
H 3 receptor has been shown to be a presynaptic heterorecep- 
tor in nonhistamine-containing neurons in both the central 
and peripheral nervous systems (for review, see Hill et al., 
1997). Through the molecular cloning of Hi and H 2 , these 
receptors were proven to belong to the superfamily of G 
protein-coupled receptors (GPCRs; Gantz et al., 1991; Ya- 



mashita et al., 1991). For the past 10 years, the histamine H 3 
receptor has been the target of numerous cloning and puri- 
fication attempts, yet its molecular identity has remained an 
enigma. 

We have initiated an effort to identify and clone orphan 
GPCRs as a means to identify novel drug targets and as a 
way to discover novel neurotransmitters and peptides. This 
is an approach used by many investigators, and it has led to 
the successful identification of ligands such as nociceptin 
(Reinscheid et al., 1995), prolactin-releasing factor (Hinuma 
et al., 1998), the orexins (Sakurai et al., 1998), and, more 
recently, apelin (Tatemoto et al, 1998). There are at least 70 
orphan GPCRs in the public domain. We have identified, 
through searching public and private databases, at least 30 
additional putative members of this family via expressed 
sequence tags (ESTs). One of these orphan receptors, our 
designation GPCR97, was expressed abundantly in the cen- 
tral nervous system, and its 5 '-most sequence shares signif- 
icant homology with the putative transmembrane domain 



ABBREVIATIONS: GPCR, G protein-coupled receptor; EST, expressed sequence tag; cAMP, cyclic AMP; PCR, polymerase chain reaction. 
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VII of several members of the biogenic amine family of re- 
ceptors. Therefore, we investigated the possibility that the 
GPCR97 cDNA encodes a novel neurotransmitter receptor. 

Experimental Procedures 

Materials. Human mRNA and all Northern blots were purchased 
from Clontech (Palo Alto, CA). cDNA synthesis kits were purchased 
from Gibco Life Technologies (Gaithersburg, MD). Gelzyme was ob- 
tained from Invitrogen (San Diego, CA), and pCIneo vector was 
obtained from Promega (Madison, WI). All cell lines were obtained 
from American Type Culture Collection (Manassas, VA). Cyclic AMP 
(cAMP) Flashplates were obtained from DuPont/New England Nu- 
clear (Boston, MA). Fluo-3 was purchased from TEF Laboratories 
(Austin, TX) G418 was purchased from Calbiochem (San Diego, CA). 
All histamine ligands were purchased from Research Biochemicals, 
Inc. (Natick, MA). All other reagents were purchased from Sigma 
Chemical Co. (St. Louis, MO). 

Cloning of GPCR97 cDNA A human thalamus cDNA library 
was constructed from poly(A) + -selected RNA as described by the 
manufacturer (Gibco Life Technologies). Double-stranded DNA was 
digested with Notl and then run on a 0.8% low-melting agarose gel, 
and cDNA in the range of 2.5 to 5 kilobases (kb) was excised, purified 
with Gelzyme, and subsequently was subcloned into pSport vector. 
The size-selected human thalamus cDNA library was screened with 
a radiolabeled fragment of the GPCR97 EST clone. A full-length 
GPCR97 was obtained and, subsequently, cloned into the mamma- 
lian expression vector pCIneo (Promega) and transfected into human 
embryonic kidney 293 cells, rat C6 glioma cells, and human SK- 
N-MC neuroblastoma cells. 

Transfection of Cells with GPCR97 cDNA. Cells were grown 
to about 70% to 80% confluence and then removed from the plate 
with trypsin and pelleted in a clinical centrifuge. The pellet was then 
resuspended in 400 /xl of complete media and transferred to an 
electroporation cuvette with a 0.4-cm gap between the electrodes (no. 
165-2088; Bio-Rad Laboratories, Hercules, CA). One microgram of 
supercoiled DNA was added to the cells and mixed. The voltage for 
the electroporation was set at 0.25 kV and the capacitance was set at 
960 fiF. After electroporation, the cells were diluted into 10 ml of 
complete media and were plated onto four 10-cm dishes at the fol- 
lowing ratios: 1:20, 1:10, 1:5, and the remaining cells. The cells were 
allowed to recover for 24 h before the addition of G-418. Colonies that 
survived selection were grown and tested. Several different cell lines 
were used for transfection, which served two purposes. First, because 
single-cell cloning can often uncover endogenously expressed recep- 
tors (unpublished observations), it is imperative to see the desired 
function in multiple transfections in different cell lines. Second, each 
cell line has a unique characteristic that can be used to enhance 
different aspects of the study. For example, C6 cells grow very fast 
and are easy to culture and, thus, are good for generating lots of 
membranes for binding. SK-N-MC cells give robust cAMP accumu- 
lation and give efficient coupling for inhibition of adenylate cyclase. 
L cells consistently transfect well and have few endogenous recep- 
tors, and, thus, are good for reliable initial characterization of re- 
combinant receptors. It should be noted that inhibition of adenylate 
cyclase and [ 3 H]J?-a-methylhistamine binding were observed in all of 
the GPCR97-transfected cells. Only the best responding cell lines 
were used for further study. 

cAMP Accumulation. Transfected cells were plated on 96-well 
plates. Overnight cultures were then incubated with Dulbecco's mod- 
ified Eagle's medium-Fl2 media containing isobutylmethylxan thine 
(2 mM) for 20 min, treated with agonists, antagonists, or both for 5 
min, and then treated with forskolin (10 jiM) for 20 min. The reaction 
was stopped with 1/5 volume 0.5 N HCI. Cell media were then tested 
for cAMP concentration by radioimmunoassay with cAMP Flash- 
plates. 

Calcium Mobilization. Transfected cells were plated on black 
96-well plates with clear bottoms. Overnight cultures were then 



incubated with Dulbecco's modified Eagle's medium-F12 media con- 
taining the fluorescent calcium indicator fluo-3 (4 /iM) and probeni- 
cid (2 mM) for 60 min. Ligand-induced fluorescence was then mea- 
sured on a Fluorometric Imaging Plate Reader (FLEPR; Molecular 
Devices, Sunnyvale, CA). 

tf-a-Methyl[ 3 H]histamine Binding. Cell pellets from GPCR97- 
expressing C6 cells were homogenized in 20 mM Tris-HCl/0.5 mM 
EDTA. Supernatants from a 80Qg spin were collected and recentri- 
fuged at 30,00Qg for 30 min. Pellets were rehomogenized in 50 mM 
Tris/5 mM EDTA (pH 7.4). Membranes were incubated with 0.4 nM 
i?-a-methyl[ 3 H] histamine plus/minus test compounds for 45 min at 
25°C and harvested by rapid filtration over GF/C glass fiber filters 
(pretreated with 0.3% polyethylenimine), followed by four washes 
with ice-cold buffer. Nonspecific binding was defined with 10 jxM 
histamine. pKi values were calculated based on a K d of 150 pM and 
a ligand concentration of 400 pM (Cheng and PrusofF, 1973). 

In Situ Hybridization. Three adult male Sprague-Dawley rats 
were perfused with 4% paraformaldehyde in 0.1 M borate buffer 
fixative, and their brain tissues were postfixed overnight in fixative 
with 10% sucrose and frozen in dry ice. Five l-in-5 series of 30-^m- 
thick coronal sections of the whole brain were cut on a sliding 
microtome and mounted onto glass slides. In situ hybridization was 
performed with 35 S-riboprobes on this tissue by an adapted protocol 
(Simmons et al., 1989). Then the tissue samples were put on X-ray 
film for 1 day, after which they were dipped in NBT2 nuclear emul- 
sion (Eastman Kodak Co., Rochester, NY), and kept desiccated in the 
dark at 4°C for 6 days. Slides were developed, were Nissl stained, 
and were studied under the microscope to identify structures labeled 
with the GPCR97 cRNA probe. 

RNA Probes. The cRNA probe was constructed from a partial rat 
GPCR97 cDNA clone originally identified by polymerase chain reac- 
tion (PCR) amplification from rat brain cDNA with primers designed 
against the human receptor (5' primer, 5 '-AGTCGGATCCAGCTAC- 
GACCGCTTCCTGTC-3'; 3' primer, 5 ' - AGTCAAGCTTGGAGC- 
CCCTCTTGAGTGAGC-3 ' ). The resulting -607-base pair (bp) frag- 
ment was ligated into pBluescript (Stratagene, La Joila, CA). ^S- 
UTP-labeled antisense and sense probes for rat GPCR97 were 
synthesized after linearization with BamHl or Hindlll with T7 or T3 
RNA polymerase, respectively. The labeled sense strands served as 
controls and did not show any specific labeling of cellular localization 
(data not shown). Specific activities of 35 S-UTP probes were approx- 
imately 2 to 3 X 10 6 counts per minute//xg. All restriction enzymes 
and phage RNA polymerases were obtained from Boehringer Mann- 
heim (Indianapolis, IN). 

Northern Blot Analysis. Northern blots obtained from Clontech 
(Palo Alto, CA) were hybridized with a- 32 P-dCTP-labeled (Amer- 
sham Pharmacia Biotech, Piscataway, NJ) human GPCR97 cDNA as 
described by the manufacturer (Expresshyb, Clontech). Two million 
counts per milliliter was used in a total volume of 10 ml of hybrid- 
ization buffer and incubated at 68"C for 2 h. The blot was then 
washed two times at RT in 2 x standard saline citrate and 0.05% 
SDS for 30 min each. It was further washed two more times for 30 
min each at 60"C and exposed overnight to film. 

Results 

Cloning and Sequence Analysis of GPCR97 cDNA. 

GPCR97 was initially identified as an EST in a basic local 
alignment search tool (Altschul et al., 1990) search of the Life 
Seq database (Incyte Pharmaceuticals, Palo Alto, CA) with 
the a 2 -adrenergic receptor sequence as a query. The 5' end of 
the GPCR97 EST had approximately 35% homology to the 
seventh transmembrane domain of the or2-adrenergic recep- 
tor. Semiquantitative PCR of GPCR97 with cDNA templates 
from a variety of human tissues showed expression predom- 
inantly in the central nervous system, with the greatest 
intensity in the thalamus. Therefore, we constructed a size- 
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selected human thalamus cDNA library and screened it with region with low homology (20-27%) to the biogenic amine 
the original EST fragment as a labeled probe. From this subfamily of GPCRs. Most notable was an aspartic acid res- 
screen, a full-length 2.7-kb clone consisting of a 298-bp 5'- idue in the putative transmembrane domain III, the putative 
untranslated region, a 1335-bp open reading frame, and a binding site for the primary amine, which is a clear hallmark 
1100-bp 3'-untranslated region was obtained. Translation of of the biogenic amine receptor subfamily (Fig. 1). This con- 
the open reading frame revealed a 445-amino acid coding served aspartic acid residue is shown in the alignment of the 



TMl 

HI MSLPN SSCLLEDKMCEGNKTTMASPQLMPLVWIiSTICLVTVGLNLLVLYAVR 

H2 MAPNG TASSFCL-DSTAC — K ITITVVLAVLILITVAGNVVVCLAVG 

GPCR97 ME RAP PDGP LNASGALAGDAAAAGGARGF SAAWTAVLAALMALL IVATVTjGNAIjVMIiAFV 

* * ***** 

TM2 

HI SERKLHWGNLYIVSLSVADLIVGAVVliPMNILyLLMSKWSr^RPIjCLFWLSMDYVASTA 
H2 LNRRLRNLTNCFIVSLAITDLLLGIiLVLPFSAIYQLSCKWSPGKVFCNIYTSLDVMLCTA 
GPCR97 ADSSLRTQNNFFLLNLAISDFLVGAFCIPLYVPYVLTGRVTTFGRGLCKLWLVVDYLLCTS 
* * *** * ***** * * 

TM3 TM4 

HI SIFSVFILCIDRYRSVQQPLRYLKYRTKTR-ASATILGAWFLSFLWVIP — ILGWNHFMQ 

H2 S ILNLFMISLDRYCAVMDPLRYPVLVTPVR-VAI SLVLIWVIS I TLSFLSIHLGWNSRNE 

GPCR97 SAFNIVLISYDRFLSVTRAVSYRAQQGDTRRAVRKMLLVWVLAFLliYGP-AILSWEYLSG 
* ***** * ** 

TM5 

HI QTSVRRED-KCETDFYDVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHCQHRELINR 
H2 TSKGNHTTSKCKVQVNEV--YGLVDGLVTFYLPLLIMCITYYRIFKVARDQAKR INH 

GPCR97 GSS I PEGH- - CYAEFFYNWYFLI TASTLEFFTPFLSVTFFNLS I YLNI - - Q - RRTRLRLD 

* * * * * 

HI SLPSFSEIKLRPENPKGDAKKPGKESPWEVLKRKPKDAGGGSVLKSPSQTPKEMKSPWF 

H2 ISSWKAATIREH 

GPCR97 GARE AAGPE P P PE AQ P S P P P P PGCWGCWQ KGHGE AMPLHRYGVGE AAVGAE AGE ATLGGG 

* 

HI SQEDDREVDKLYCFPLDIVHMQAAAEGSSRDYVAVNRSHGQLKTDEQGLNTHGASEISED 

H2 

GPCR97 GGGGSVASPTSSSGSSSRGTERPRSLKRGSKPSASSASLEKRMKMVSQSFTQRFRLSRDR 

HI QMLGDSQSFSRTDSDTTTETAPGKGKLRSGSNTGLDYIKFTWKRLRSHSRQYVSGLHMNR 

H2 

GPCR97 

TM6 

HI ERKAAKQLGFIMAAFILCWIPYFIFFMVIAFCKNCCNEHL 

H2 --KATVTLAAWGAFIIOTFPYFTAFVTfRGLRGDDAINEVLEAIVNASQLSRTQSREPRQ 

GPCR97 — KVAKSLAVIVSIFGLCWAPYTLLMIIRAACHGHCVPDYW 

* * ***** 

TM7 

HI HMFTIWLGYINSTLNPLIYPLCNENFKKTFKRILHIRS 

H2 QEEKPLKLQVWSGTEVTAPQGATDRLWLGYANSALNPILYAALNRDFRTGYQQLFCCRL 

GPCR97 YETSFWLLWANSAVNPVLYPLCHHSFRRAFTKLLCPQK 

** ***** * 

HI 

H2 ANRNSHKTSLRS-- 
GPCR97 LKIQPHSSLEHCWK 



Fig. 1. Amino acid sequence of human GPCR97 receptor compared with the human histamine Hj and receptors. Putative transmembrane domains 
are stated above the sequence and indicated by a solid line. Residues that are identical among all three receptors are indicated by an * below the 
sequence. DNA and protein sequences have been deposited with GenBank (accession no. AF 140538) 
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predicted amino acid sequence of GPCR97 with the human 
histamine Hj and H 2 receptors. Overall homology between 
GPCR97 and the U x and H 2 receptors is 22% and 21.4%, 
respectively. 

GPCR97-Expressing Cells Inhibit Adenylate Cyclase 
in Response to Histamine. Given the homology of GPCR97 
to the biogenic amine family, we first tested its ability to 
respond to several of the amine neurotransmitters, measur- 
ing either the stimulation of calcium mobilization or the 
increase or decrease of cAMP accumulation in mouse L cells. 
The biogenic amine ligands tested (acetylcholine, dopamine, 
imidazole, epinephrine, tryptamine, serotonin, and hista- 
mine) were negative for an increase in both calcium mobili- 
zation or in cAMP accumulation (not shown). However, after 
forskolin stimulation of basal cAMP accumulation, there was 
a selective and marked inhibition of adenyate cyclase in 
response to histamine in the transfected cell line but not in 
the nontransfected cell line (Fig. 2). This effect was mimicked 
by the high-affinity H 3 agonist 72-a-methylhistamine, which 
has an EC 50 of 1 nM (Fig. 3). In addition, the effect of 
^a-methylhistamine could be blocked by the known selec- 
tive H 3 antagonists thioperamide and clobenpropit (Fig. 3) 
but not by the H x antagonist diphenhydramine (Fig. 3) or the 
H 2 antagonist ranitidine (not shown). 

GPCR97-Expressing Cells Bind the High-Affinity 
Histamine H 3 Ligand #-a-Methyl[ 3 H]histamine. To con- 
firm the H 3 pharmacology, we examined whether the 
GPCR97-transfected cells could bind the H 3 ligand R-a- 
methyl[ 3 H] histamine. For these studies, we transfected a 
different cell line (C6 glioma cells) because of its of ability to 
grow fast. C6 cells transfected with GPCR97 were able to 
bind [ 3 H]^-a-methylhistamine with high affinity (Fig. 4, in- 
set), whereas untransfected cells had no demonstrable bind- 
ing (not shown). In addition, the known H 3 agonists (hista- 
mine, imetit, and jV-methylhistame) and antagonists 
(thioperamide and clobenpropit) could all compete for bind- 
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ing (Fig. 4) with a rank order of potency consistent with that 
described for the histamine H 3 receptor (Table 1). 

GPCR97 is Expressed Abundantly in the Central 
Nervous System. Because the pharmacological profile of 
GPCR97 was consistent with the histamine H 3 receptor, we 
investigated the mRNA distribution and compared it to the 
known distribution of H 3 binding sites. Northern blots of 
human mRNA showed expression only in the brain, most 
notably in the thalamus and the caudate nucleus (Fig. 5). 
Little expression was observed in any peripheral tissue ex- 
amined (heart, placenta, lung, liver, skeletal muscle, kidney, 
pancreas, spleen, thymus, prostate, testis, ovaries, small in- 
testine, colon, stomach, thyroid, lymph node, trachea, and 
bone marrow; data not shown). To obtain a rat homolog of the 
GPCR97 cDNA, we used oligonucleotide primers designed 
from the human sequence to amplify a cDNA fragment from 
RNA extracted from rat brain. This rat cDNA probe (which 
has 85% nucleotide identity to human GPCR97) was subse- 
quently used to examine the tissue distribution of GPCR97- 
encoded mRNA by in situ hybridization in rat brain sections. 
GPCR97 mRNA is abundantly expressed in rat brain and is 
most notably observed throughout the thalamus, the ventro- 
medial hypothalamus, and the caudate nucleus (Fig. 6, A and 
B). Strong expression was also seen in layers II, V, and VIb of 
the cerebral cortex, in the pyramidal layers (CA1 and CA2) of 
the hippocampus, and in olfactory tubercle (Fig. 6, A and B). 
Because the H 3 receptor functions as an inhibitory presyn- 
aptic receptor, it is expected that the mRNA localization may 
not exactly match the functional receptor localization, de- 
pending on the axonal length of the neuron expressing it. For 
example, noradrenergic cells in the locus ceruleus project to 
all areas of the cerebral cortex where histamine, via H 3 
receptors, is known to regulate noradrenaline release 
(Schlicker et al., 1989; Smits and Mulder, 1991). Therefore, it 
was predicted and confirmed that the mRNA for GPCR97 
was expressed in the locus ceruleus (Fig. 6, C and E). In 
addition, because the H 3 receptor has also been functionally 
demonstrated on the histamine terminals in the cerebral 
cortex (Arrang et al., 1983), its mRNA must also be located in 
the histaminergic cell bodies in the tuberomammillary nu- 
clei. This was also confirmed for GPCR97 (Fig. 6D). 
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Fig. 2. Inhibition of cAMP accumulation in response to the various amine 
transmitters. Cells were treated with 10 jiM forskolin 5 min after the 
addition of compounds (1 jtM) and incubated for an additional 20 min. All 
values were determined in duplicate. Error bars represent S.E.M. 
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Fig. 3. Inhibition of cAMP accumulation in response to the agonist 
i?-a-methylhistamine. Cells were treated with 10 forskolin 5 min 
after the addition of i?-a-methylhistamine and incubated for an addi- 
tional 20 min. Where indicated, antagonists (1 ^M) were incubated 5 min 
before the addition of the agonist alone (■), with diphenhydramine (♦), 
with thioperamide U), or with clobenpropit (•). All values are deter- 
mined in triplicate. Error bars represent S.E.M. 
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There are numerous reports of presynaptic H 3 receptors in 
the autonomic nervous system controlling neurotransmitter 
release in the heart, the lung, and the gastrointestinal tract 
(Arrang et al., 1988; Molderings et al., 1992; Bertaccini and 
Coruzzi, 1995; Imamura et al., 1995; Stark et al., 1996a). 
GPCR97 mRNA was detected by PCR amplification in RNA 
extracted from human small intestine, testis, and prostate 
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Fig. 4. Top, saturation isotherm and Scatchard transformation (inset) of 
rt-a-methylpHlhistamine to GPCR97-transfected C6 cells. Total binding 
(■), nonspecific binding (A), and specific binding (O) are shown. Bottom, 
competition binding of [ 3 H]i2-a-methylhistamine (0.4 nM) in the presence 
of various concentrations of H 3 agonists and antagonists. K D was calcu- 
lated as — 1/slope from the linear Scatchard transformation. pIC^ values 
were determined by a single site curve fitting program (Prism; GraphPad 
Software, San Diego, CA) and converted to pK, values according to Cheng 
and Prusoff (1973). 

TABLE 1 

pK{ values of known histamine agonists and antagonists 



Compound 



W-methylhistamine 


-9.8 


Imetit 


-9.7 


Immepip 


-9.7 


Clobenpropit 


-9.3 


Histamine 


-8.5 


Thioperamide 


-7.7 


Ranitidine 


>-5 


Diphenhydramine 


>-5 


Clozapine 


>-5 


Cirazoline 


>-5 


Mepyramine 


>-5 


Imidazole 


>-5 



tissues, but was not detected in these tissues by Northern 
blot analysis (not shown). If GPCR97 was only expressed in 
the neuronal plexus, its overall low abundance in a whole 
tissue preparation could account for this discrepancy. We are 
currently investigating via in situ hybridization whether the 
GPCR97 receptor mRNA is produced in the ganglia of the 
autonomic and enteric nervous systems. An alternative ex- 
planation for the absence of clear peripheral expression could 
be the existence of additional subtypes of the H 3 receptor, 
which previously has been suggested based on pharmacolog- 
ical evidence (West et al., 1990; Raible et al., 1994; Leurs et 
al., 1996; Schlicker et al., 1996). 

Discussion 

The present data describes the cloning and characteriza- 
tion of a novel GPCR, GPCR97, with a pharmacology and a 
tissue distribution that is consistent with the histamine H 3 
receptor subtype. We found that cells transfected with 
GPCR97 were able to inhibit adenylate cyclase in response to 
histamine. Because the two known cloned histamine recep- 
tors, Hj and H 2 , activate phosphoinositide hydrolysis and 
stimulation of adenylate cyclase, respectively, the inhibition 
of adenylate cyclase that we observed is a new finding for a 
cloned histamine receptor. It should be noted that previous 
experiments with pertussis toxin- and histamine-stimulated 
35 S-GTPyS binding have suggested that the H 3 receptor 
might be G r l inked (Clark et al., 1993; Laitinen and Jokinen, 
1998). Because the putative H 3 histamine receptor has been 
pharmacologically defined (Arrang et al., 1987; Leurs et al., 
1998), we were able to test known selective agonists and 
antagonists. The selective H 3 agonist i?-a-methylhistamine 
was able to potently and dose-dependently inhibit forskolin- 
stimulated adenylate cyclase, an effect that was mimicked by 
two additional H 3 agonists, imitet and iV-a-methylhistamine 
(data not shown). In addition, the effect of iZ-a-methylhista- 
mine was blocked by the selective H 3 antagonists thioperam- 
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Values were determined by competition binding with i?-a-methyl[ 3 H] histamine to 
GPCR97-expressing cell membranes. 
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Fig. 5. Northern blot analysis of human brain mRNA samples (5 ug of 
poly(A)^ RN A/lane). Lane 1, amygdala. Lane 2, caudate. Lane 3, corpus 
callosum; Lane 4, hippocampus. Lane 5, whole brain. Lane 6, substantia 
nigra. Lane 7, thalamus. The probe was the full-length GPCR97 coding 
sequence. Exposure time to film was 3 days (-80*C). 
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Fig. 6. Distribution of GPCR97 mRNA in rat brain. Representative film 
autoradiograms of coronal sections arranged rostral to caudal (A-C) and 
darkfield photomicrographs of coronal brain sections showing GPCR97 
mRNA in the ventral portion of the tuberomammillary nucleus (D), and 
in the locus ceruleus (E). Magnification, D = 100 x and E = 40 x. Abbre- 
viations: CA1, CA2, pyramidal layers of the hippocampus; CP, caudopu- 
tamen; Cx, cortex; EPd, endopiriform nucleus, dorsal part; LC, locus 
ceruleus; OT, olfactory tubercle; Th, thalamus; TTMv, tuberomammillary 
nucleus, ventral portion; VMH, ventromedial hypothalamus. 



ide and clobenpropit but not by the H x or H 2 antagonists 
diphenhydramine or ranitidine. GPCR97-transfected cells 
also bound the high-affinity H 3 agonist i?-a-metiiylpH] histamine. 
All of the tested H 3 agonists and antagonist could compete for 
specific i?-a-methyl[ 3 H] histamine binding with similar po- 
tencies to those reported for these compounds to brain mem- 
branes (Hill et al., 1997). It has been suggested that clozapine 
may impart some of its antipsychotic effects in humans 
through H 3 receptor antagonism (Kathmann et al., 1994; 
Rodrigues et al., 1995; Stark et al., 1996b). We found that 
clozapine did not significantly compete for binding to the 
recombinant human receptor (Table 1). These differences in 
pharmacology may be because of species differences or 
possible H 3 heterogeneity (West et al., 1990). 

One of the most striking features of this receptor is the 
abundant expression in the central nervous system, particu- 
larly in the caudate, the thalamus, and the cortex. Thus, it is 
surprising that this receptor cDNA has eluded so many clon- 
ing attempts over the years. To explain the previous unsuc- 
cessful attempts to clone the H 3 receptor, we compared the 
sequence of GPCR97 to that of the H x and H 2 receptors (Fig. 
1). The low overall homology among these three receptors 
suggests, in retrospect, that low-stringency hybridization ap- 
proaches or degenerate PCR would not have been fruitful. In 
addition, we searched the public EST databases with the 
entire H 3 receptor mRNA sequence. We found that the H 3 
receptor exists in the public domain in several clones derived 
from human brain libraries. However, all of these clones 
primarily contain only a 3 '-untranslated sequence, suggest- 
ing that there may be some secondary structure present that 
prevents a full-length H 3 encoding mRNA from being effi- 
ciently copied by reverse transcription. Our success in 
screening the human thalamus may be due to its abundance 
in that specific brain region, coupled with the fact that we 
size-selected for mRNAs greater than 2.5 kb. 

There are many questions that remain to be answered 
about the histamine H 3 receptor that we can now begin to 
answer with the cDNA. For example, are there additional H 3 
receptor subtypes? What additional neurotransmitter sys- 
tems are regulated by histamine H 3 receptors? Are H 3 recep- 
tors expressed on nonneuronal cells in the periphery? We are 
currently seeking to answer some of these questions. In ad- 
dition, we are inactivating the H 3 receptor gene in mice (i.e., 
knockout mice) to identify its role in central nervous system 
function and memory control and as a means to look for 
additional pheno types, which may lead to a better under- 
standing of the physiological role of H 3 receptors in normal 
and pathological states. 
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Orphan G protein-coupled 
receptors: a neglected 
opportxinity for pioneer 
drug discovery 

Jeffrey M. Stadel. Shelagh Wilson and . ■ 
Derfc J. Bergsma 

Access to DNA databases has introduced an exerting 
new dimension to the way biomedical research is 
conducted. 'Genomic research* offers tremendous 
opportunity for accelerating the identification of the 
cause of disease at the molecular level and thereby 
foster the discovery of more selective medicines to 
improve human health and longevity. The current 
challenge is to close the gap rapidly between gene 
identification and clinical development of efficacious 
therapeutics. In the present review, Jeffrey Stadel 
Shelagh Wilson and Derfc Bergsma outline the 
rationale and describe strategies for converting one 
large class of novel genes, orphan G protein-coupled 
receptors {GPCRs), into therapeutic targets. 
Historically, the superfamily of GPCRs has proven to be 
among the most successful drug targets and 
consequently these newly isolated orphan receptors 
have great potential for pioneer drug discovery. 

The advent of rapid DNA sequencing spawned the 
'genomic era', which has led to the initiation of the Human 
Genome Project The novel technologies developed 
in association with genomic research have already had a 
significant impact on the way investigations into the 
basts of disease are being conducted and will, no doubt 
substantially enhance the means by which diseases are 
diagnosed and treated in the near future. To keep pace 
with the evolution of molecular medicine, the pharma- 
ceutical industry has embraced genomics and is attempt- 
ing to exploit the new technologies to identify novel tar- 
gets for drug discovery. The major questions that remain 
to be addressed concern how to convert genomic 
sequences into therapeutic targets in an expeditious 
. .manner and eventually to obtain pharmaceutical drugs 
that will enhance the quality of life. This review will deal 
with a single class of novel molecular targets, focusing 
on the burgeoning collection of G protein- 
coupled receptors (GPCRs) called 'orphan' receptors 1 . 
GPCRs are a superfamily of integral plasma membrane 
proteins involved in a broad array of signalling path- 
ways. Since the first doning of GPCR gene sequences 
over a decade ago, novel members of the GPCR 



superfamily have continued to emerge through cloning 
activities as well as through bioinformatic analyses of 
sequence databases, although their ligands are unidenti- 
fied and their physiological relevance remain to be 
defined. These 'orphan' receptors provide a rich source 
of potential targets for drug discovery. 

The members of the GPCR superfamily are related 
both structurally and functionally. The signature motif 
of these receptors is . seven distinct hydrophobic 
domains, each of which is 20-30 amino acids long and 
which are linked by hydrophilic amino acid sequences of 
varied length*- 3 . Biophysical 4 and biochemical 5 studies 
support the notion that these receptors are intercalated 
into the plasma membrane with the amino terminus 
extracellular and the carboxy terminus in the cyto- 
plasmic portion of the cell. Therefore, these receptors are 
often referred to as seven transmembrane (or 7TM) 
receptors. While it is not yet known how many individual 
genes actually encode these receptors, it is clear that this 
family of proteins is one of the largest yet identified. 
Functionally, GPCRs share in common the property that 
upon agonist binding they transmit signals across the 
plasma membrane through an interaction with hetero- 
trimeric G proteins 6 - 7 . These receptors respond to a vast 
range of agents** 8 suc h as protein hormones, 
chemokines, peptides, small biogenic amines, lipid- 
derived messengers, divalent cations (e.g. a Ca 2 * sensor 
has been identified that is a GPCR) 9 and even proteases 
such as thrombin, which activates its receptor by cleav- 
ing off a portion of the amino terminus 10 . Finally, these 
receptors play an important role in sensory perception 
including vision and smell^ 5 * 8 . Correlated with the broad 
range of agents that activate these receptors is their exist- 
ence in a wide variety of cells and tissue types, indi- 
cating that they play roles in a diverse range of physio- 
logical processes. It is likely, therefore, that the GPCR 
superfamily is involved in a variety of pathologies. This 
point was recently emphasized by the surprising discov- 
ery that certain GPCRs for chemokines act as co-factors 
for HIV infection 1 *- 13 . 

GPCRs represent the primary mechanism by which 
cells sense alterations in their external environment and 
convey that information to the cells' interior. The binding 
of an agonist to the receptor promotes conformational 
changes in the cytoplasmic domains that lead to the 
interaction of the receptor with its cognate G protein(s). 
Agonist-promoted coupling between receptors and G 
proteins leads to the activation of intracellular effectors 
that substantially amplify the production of second 
messengers feeding into the signalling cascade. Since 
effectors are often enzymes [e.g. adenylate cyclase 14 , 
which converts ATP to cAMP, or phospholipase C 
(Ref. 15), which hydrolyses inositol lipids in membranes 
to release inositol tiisphosphate, which in turn mobilizes 
Ca 2 * within a cell] or ion channels 16 , many second 
messenger molecules can be produced as the result of a 
single agonist binding event with its receptor. Changes 
in the intracellular levels of ions or cAMP, or both. 
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Ra. 1 Comparison of the protein sequence identity of the orphan APJ W receptor with the angiotensin AT t receptor" The filled circles indicate amino acid 
identity (29.9%) between the two G protein-coupled receptors (GPCRs). This is a typical example of the protein sequence identity shared between orphan 
and known GPCRs. 



result in the modulation of distinct phosphorylation 
cascades 17 ' 18 , extending through the cytosol to the 
nucleus, that eventually culminate in the physiological 
response of the cell to the extracellular stimulus. 
Although the overall paradigm is apparently the same 
for all GPCRs, the diversity of receptors, G proteins 
and effectors suggest a myriad of potential signalling 
processes and this becomes an important concept as we 
try to identify the function of orphan GPCRs. 

To date, more than 800 GPCRs have actually been 
doned from a variety of eukaryotic species, from fungi to 
humans [see L. F. Kolakowski in GCRDb-WWW The G 
Protein-Coupled Receptor DataBase World-Wide-Web 
Site (http:/ /rec^tor.mghi^rvarcLedu/GCRDBHOME. 
htmLorg)]. For humans, the most represented species, 
about 140 GPCRs have been cloned for which the cog- 
nate ligands are also known. This number excludes the 
sensory olfactory receptors, of which hundreds to thou- 
sands are predicted to exist By traditional molecular 
genetic approaches, coupled with the explosion in 
genomic information, it has been possible to identify 
more than 100 additional orphan GPCR family members. 
By definition, there is enough sequence information in 
the receptor cDNAs to place them clearly in the super- 
family of GPCRs, but often there is insufficient sequence 
homology with known members of this family to be able 
to assign their ligands with confidence or predict their 
function. In total there are currently over 240 human 
GPCRs, excluding sensory receptors. As the size of 
sequence databases continues to increase; this list is 
expected to grow to 400, and perhaps even to 1000 or 
more unique gene products. The list will grow even fur- 
ther as paralogues and alternatively spliced GPCR vari- 
ants emerge* Most orphan GPCRs share a low degree of 



sequence homology (typically about 25-35% overall 
amino acid sequence identity), with known GPCRs, sug- 
gesting that they belong to new subgroups of receptors 
(Fig. l) 19 ' 20 . Indeed, several orphan GPCRs show closer 
homology to each other than to known GPCRs. Never- 
theless, the majority of orphan receptors are phylo- 
genetically distributed among a broad spectrum of dis- 
tantly related, known receptor subgroups. 

What is the rationale for investing considerable time 
and resources into trying to establish the function of 
orphan GPCRs? Simply stated, GPCRs have a proven 
history of being excellent therapeutic targets. Within the 
past 20 years, several hundred new drugs have been reg- 
istered that are directed towards activating or antagon- 
izing GPCRs; in fact, it is estimated that most current 
research within the pharmaceutical industry is focused 
on this signalling pathway 21 . Table 1 shows a represen- 
tative snapshot of a variety of receptors, disease targets 
and corresponding drugs. It is dear from this table that 
the therapeutic targets span a wide range of disorders 
and disease states. Another example of the significance 
and versatility of GPCRs is the number of cases of genetic 
diseases that are linked to defects in these proteins; some 
of these diseases are indicated in Table 2 (Refs 22-38) . It 
is likely that many more genetic diseases will be mapped 
to GPCRs as the era of genomics continues to expand and 
families with inherited mutations are examined much 
more co mpr ehensively. 

The importance of GPCRs to drug discovery continues 
to be manifested by the fact that across the pharmaceuti- 
cal industry active research projects, ranging from basic 
studies all the way through to advanced development, 
are focused on GPCRs as primary targets. Molecular 
biology has had a dramatic influence on these efforts. 
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Table 1. Examples of marketed drags for G protein-coupled receptors (GPCfU) 


GPCfi 


Generic 


Drag 


Indication 




Muscarinic acetylcholine 


Bethanechol 


Urecholine 


Gl 




• * • 


Dicyclomine 


• BentyT 


Gl 




u 




Ipratropium 


Atrovent 


CP 




Adrenoceptor 






1 












Pt 


Atenolol 


Tenormin 


CP • 




a 2 


Donidine 


Catapres 


CP 








Propranolol 


Inderal 


CP 






a, 


Terazosin 


Hytrin 


CP 




\ 


P2 


Albuterol 


Ventolin 


CP 




P,/P 2 /a 1 


Carvedilol 


Coreg 


CP 






/wyioiensin 






: (- 




AT, 


Losartan 




CP 








Eprosartan 


Teveten 


CP 






Calcitonin 


Calcitonin 

VOIvllUI III 1 


r*olr~irT\^r 

VulUI 1 K3I 


Osteoporosis 






eel -Calcitonin 


Elcatnnin 

^* vd iUI III 1 


Osteoporosis 




Dopamine 












°2 


Metoclopramide 


Reglan 


Gl 








Ropinirole 


Requip 


CNS 






"2 


Halooeridol 


Haldol 


CMS 






Gonadatropirnreieasing factor 


Goserelin 


Zoladex 


Cancer 






Nafarelin 


Synarel 


Endometriosis 


Histamine 












H, 


Dtmenhydrinate 


Oramamine 


CNS 






u 

H 1 


Terfenadine 


oelaane 


CP 








Ctmetidine 


Taoamet 


Gl 








Ranitidine 


Zantac 


Gl 






Serotonin (5-HT) 












5-HT 1D 


Sumatriptan 


Imitrex 


CNS 


,A { 




Ritanserin 


Tisertan 


CNS 




* 


5-HT 4 


Cisapride 


Propulsid 


Gl 






5-HT (a 


Trazodone 


Desyrel 


CNS 






S-HTja/zc 


Clozapine 


Clozaril 


CNS 






Leutotriene 


Pranlukast 


Onon 


CP 








Zafiriukast 


Accolate 


CP 






Opioid 














Buprenorphine 


Buprertex 


CNS 








Butorphanol 


Stadol 


CNS 








Alfentanil 


Alfenta 


CNS 








Morphine 


Kadian 


CNS 






Oxytocin 




Syntocinon 


Labour 




Prostaglandin 


Epoprostenol 


Rolan 


CP 








Misoprostol 


Cytotec 


Gl 






Somatostatin 


Octreotide 


Sandostatin 


Cancer 




Vasopressin 


Desmopressin 




CP/Renal 




CP.canftopubnoaary system; Gt. cavtiouitestinal system. 
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Table 2. Diseases associated with mutations of G protein-coupled receptors (GPCRs) 


GPCR 


Mutation 


Disease 


Refs 


Rhodopsin 


Missense: Pro23 to His (NT) 
Missense: Val87 to Asp (2TM) 
Missense: Tyr1 7S to Cys (2BJ 
Nonsense: G!n344 to Stop (CT) 


Retinitis pigmentosa 


22.23 


"Thyroid stimulating hormone 


Missense: Asp619 to Gly (3ILJ 
Missense: Ala623.to lie (3IL) 


Hyperfunctioning thyroid adenomas 


24 


Luteinizing hormone 


Missense: Asp578 to Gly (6TM) 


Precocious puberty 


25 


Vasopressin V 2 


Missense: Arg137 to His (211) 
Missense: Gry185 to Cyc (2EL) 
rramesniu at Argzou \o \ mj 


X-linked nephrogenic diabetes 


2&-28 


Ca 2 * 


Missense. Argioo to blu \n i j 

Missense: Glu238 to Lys (NT) 
Missense: Arg795 to Trp (3ILJ 
Missense: Glu128 to Ala (NT) 


Hyperparathyroidism, hypocalciuric 
hypercalcemia 


Z3, JU 


Parathyroid hormone {PTH type b) 


Missense: His223 to Arg (1IL) 


Short-limbed dwarfism 


31 


{^-Adrenoceptor 


Missense: TrpB4 to Arg (1 IL| 


Obesity, NIDDM 


32-34 


Growth-hormone-releasing hormone 


Nonsense: Glu72 to Stop (NT) 


Dwarfism 


35 


Adrenocorticotropin 


Missense: Ser74 to lie (2TM) 


Glucocorticoid deficiency 


36 


Glucagon 


Missense: Gly40 to Ser (NT) 


Diabetes, hypertension 


37,38 


Abbreviations: CT. carboxyl terminus; EL, extracellular loop; tL intracellular loop; NIDDM, non-insulin-dependent diabetes mellitus; NT. amino terminus; 
TM. transmembrane segment 



The doning of cDNAs for well-known GPCRs led to the 
discovery of a surprising number of paralogues 5 . The 
existence of these novel receptor subtypes was unex- 
pected because the current cornucopia of pharmacologi- 
cal agents does not possess the required selectivity to 
distinguish all of them clearly, and thus an opportunity 
for drug discovery was quickly recognized. Current 
research efforts seek to define the physiology associated 
with these novel receptor subtypes and to discover 
highly selective compounds as potential pharmaceutical 
drugs. These efforts are almost exclusively focused on 
GPCRs for which activating ligands are known. Since 
characterized GPCRs were, and continue to be, attractive 
therapeutic targets, it is most reasonable to speculate that 
many of the orphan receptors have similar potential The 
initial challenge is to determine the function of each 
orphan receptor through die identification of activating 
ligands and, once the function is clarified, link the orphan 
receptor to a specific disease and thus establish it as a 
candidate for a comprehensive drug discovery effort 

Reverse molecular pharmacology 

Until recently, research into the identification of 
GPCRs as targets for drug discovery has been conducted 
using the traditional approach illustrated in Fig. 2. For 
this strategy, the starting point is functional activity, 
which forms the basis of an assay by which a ligand is 



identified through purification from biological fluids, 
cell supernatants or tissue extracts. One example of the 
success of this strategy is the discovery of the potent 
vasoconstricting peptide endothelin 39 . Once isolated, the 
ligand is used to characterize its cellular and tissue biol- 
ogy as well as its pathophysiological role. Subsequently, 
cDNAs encoding corresponding receptors are 'fished' 
from gene libraries using a variety of methodologies (e.g. 
receptor purification and expression cloning) that often 
either directly or indirectly use the ligand as die 'hook'. 
As the nucleotide sequences for GPCRs begin to accu- 
mulate and be analysed, additional receptors can be 
cloned by homology screening, by positional doning, 
and by polymerase chain reaction (PCR) methodologies 
that use oligonudeotide primers based on nudeotide 
sequences conserved within the seven transmembrane 
domains of the GPCR family. Once the doned human 
receptor cDNA is expressed in a heterologous cell sys- 
tem 40 , it is used, together with its ligand, to form the basis 
of a screen to explore chemical compound libraries for 
receptor antagonists or agonists. Lead structures identi- 
fied in the screen are refined through medicinal chem- 
istry using an iterative process. Resulting drug leads 
with appropriate m vivo pharmacology are passed on 
into the clinic for development 

Recently, this paradigm has changed radically with the 
introduction of a new reverse molecular pharmacological 
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Hg. 2. Paradigm shift from classical to reverse mof eculer pharmacological approaches for drug 
discovery. 



strategy, shown diagramatically in Fig. 2. Through both 
traditional molecular cloning techniques and, more 
recently, mass sequencing of expressed sequence tags 
(ESTs) from cDNA libraries, it is now possible to identify 
GPCRs through computational or bioinformatic 
methodologies. The EST approach, initially proposed by 
Sidney Brenner (University of Cambridge) and first 
brought to large-scale practice by Craig Venter (The 
Institute of Genome Research), constitutes random, sin- 
gle-pass sequencing of cDNAs randomly picked from a 
collection of cDNA libraries, followed by extensive 
bioinformatic analysis of the sequence to identify struc- 
tural signatures characteristic of GPCRs. Once new 
members of the GPCR superfamily are identified, the 
recombinantly expressed receptors are used in 
functional assays to search for the associated novel li- 
gands. The receptor-ligand pair are then used for com- 
pound bank screening to identify a lead compound that, 
together with the activating ligand, is used for biological 
and pathophysiological studies to determine the func- 
tion and potential therapeutic value of a receptor antag- 
onist (or agonist) in ameliorating a disease process. In 
addition, dues as to therapeutic potential may involve 
receptor genotyping of disease papulations. Once a link 
with a disease is finally identified, an appropriate com- 
pound can be advanced for clinical study. 

The reverse molecular pharmacological strategy is a 
far more daunting challenge and ris ky endeavour when 
compared with the more traditional approach, since the 
starting material for a drug discovery effort is simply an 
orphan receptor of unknown function, with no apparent 
relationship to a disease indication. However, the potential 
reward of using this approach is that resultant drugs nat- 
urally will be pioneer or innovative discoveries, and a 



significant proportion of these unique drugs may be use- 
ful to treat diseases for which exis ting therapies are lack- 
ing or insufficient 

Screening strategy 

Figure 3 illustrates the generic strategy that we use 
for our reverse molecular pharmacological approach. In 
addition to the EST approach, which has yielded the 
majority of our collection of orphan receptors, we have 
also used a number of more traditional approaches such 
as low-stringency screening, using portions of known 
GPCRs as hybridization probes, as well as PCR-based 
methods. By these techniques we have succeeded in 
identifying more than 70 orphan receptors in addition to 
those already cited in the literature. 

Since cDNAs identified by EST cloning are often in- 
complete, northern hybridization analysis is used to estab- 
lish the tissue or cell pattern of mRNA expression of the 
GPCRs. This information is used to identify the tissue or 
cell cDNA libraries that are to be probed for full- length 
clones and, significantly, to determine whether a receptor 
is expressed in a particular normal or diseased tissue of 
interest A highly selective tissue expression pattern may 
also provide a clue with respect to receptor function. Once 
obtained, full-length GPCR clones are expressed in mam- 
malian cell lines and yeast model systems (see below) for 
functional analysis. Xenopus oocytes may also be used for 
expression; however, low screening throughput limits 
their use to a secondary, confirmatory assay system. For 
mammalian cell expression, the human embryonic kidney 
(HEK) 293 cell line or Chinese hamster ovary (CHO) cells 
are frequently used. These cell types possess a large reper- 
toire of G proteins that are necessary for coupling to 
downstream effectors in situ. They also share a reliable 
history of positive functional coupling for a wide variety 
of known GPCRs. However, since receptor coupling 
cannot be accurately predicted from primary sequence 
data, orphan GPCRs may need to be expressed in a 
variety of cell lines to establish viable coupling. 

These heterologous expression systems form the basis 
for screening for an activating ligand. The success of 
establishing functional coupling of the recombinant 
receptor depends to a large extent on whether the recep- 
tor is properly expressed, which may be assessed by 
northern or Western blot analysis, and whether appro- 
priate G proteins and downstream effectors are present 
in the cell in which the receptor is expressed. There are 
several major technical challenges to be met in order to 
initiate ligand fishing. Because it is difficult to predict 
accurately the coupling specificity of orphan GPCRs 
from their primary sequence, assays must be chosen 
that will detect a wide range of coupling mechanisms. 
These generally focus on changes in intracellular levels 
of cAMP or Ca 2 + but can also include more generic 
measurements, such as metabolic activation of the cell 
via the cytosensor microphysiometer* 1 . Recently, it has 
become possible to Implement most of these screens in 
high-throughput format by using fluorescent-based 
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assays and using charge-coupled device cameras and 
reporter gene constructs that allow easy readout of the 
assay on microtitre plate . Ever increasing throughput of 
the assays will be necessary to screen large libraries* 
However, this approach is somewhat cumbersome and 
inefficient if all the assays described above have to be 
used Is it possible to funnel heterologous signal trans- 
duction through a defined pathway? The prospect of an 
-assay for a single transduction pathway comes from the 
observation that heterologous expression of the G pro- 
tein subunit G als/16 promoted coupling of various GPCR 
subfamily members through activation of phospholipase 
Cp and likely Ca 2+ mobilization** 3 . Although this 
approach may not work universally, the diversity of the 
GPCRs successfully coupled through G al6 to phospho- 
lipid metabolism suggests that this could be a useful 
method to screen for orphan receptor activation. 

Once heterologous receptor expression is achieved 
and functional assays are in place, ligand fishing experi- 
ments can be initiated. Although the homology with 
known GPCRs is low, we nevertheless begin by screen- 
ing the orphans against known GPCR ligands; since the 
sequence homology between some subtypes of known 
receptors can be low (e.g. 30-40% between neuropeptide 
Y receptor subtypes), it is possible that new paralogue 
receptors for known ligands still remain to be discov- 
ered. The next step is to search for novel activating 
ligands by screening biological extracts obtained from 
tissues, biological fluids and cell supernatants. An ad- 
ditional option is screening libraries of compounds for 
activating ligands. Complex libraries of peptides or com- 
pound collections could be rich sources of 'surrogate' 
agonists that would promote receptor activation and 
coupling but are not endogenous ligands. The rationale 
for searching for surrogate agonists springs from a report 
that a nonpeptide agonist has been discovered for the 
angiotensin n receptor 14 . There is also an obvious prec- 
edent for nonpeptide agonists for opioid receptors. 
Screening of the very large libraries that will be generated 
by fractionation of biological extracts and by combinato- 
rial chemical synthesis requires that the functional 
. assays used have not only a high throughput but are also 
robust, since false positives can be a significant problem. 

Examples are beginning to emerge from several 
efforts showing that progress has been made in charac- 
terizing orphan GPCRs. A first example is the identifi- 
cation of an orphan GPCR that functions as a calcitonin 
gene-related peptide (CGRP) receptor* 3 . CGRP is a pep- 
tide of 37 amino adds, widely distributed in neurones, 
and functions as a potent vasodilator^ It may be involved 
in migraine and has been implicated in non-insulin- 
dependent diabetes mellitus because it promotes resist- 
ance to insulin. An orphan GPCR EST was derived from 
a human synovium cDNA library 45 . Sequence analysis 
showed that the new GPCR has -56% similarity to the 
human calcitonin receptor and was hence originally 
expected to be a new subtype of the calcitonin receptor. 
The message for this novel receptor was expressed 
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predominantly in lung, which is known to be a relatively 
rich source of CGRP receptors. Following fulMength 
cloning from a human lung library, the orphan receptor 
cDN A was stably expressed in HBK293 cells- Both radio- 
ligand binding using ^[qCGRP, as well as functional 
assays of CGRP-sttmulated cAMP accumulation, 
demonstrated an appropriate pharmacological profile 
for the expressed receptor similar to that observed with 
endogenous CGRP receptors on human neuroblastoma 
cells. In addition to identifying the CGRP receptor, the 
reverse molecular pharmacology approach has also been 
used to identify other orphan receptors, such as the 
anaphyiatoxin C3a receptor 46 . 

The examples given above are for receptors with sig- 
nificant homology to known GPCR superfamfly mem- 
bers and their activating ligands proved to be known 
GPCR Kgands. WiU ligand fishing be raccessM in iden^ 
. fying novel endogenous ligands? Recently, two groups 
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fig. 4. Yeast-based screen for the identification of agonists for orphan 6 protein-coupled 
receptors (GPCRs). a: Normal, endogenous GPCfl signalling in yeast {Saccharomyces 
cercvtswei. Ir Substitution of a human GPCR and a human G„ subunh for yeast counterparts 
and mocfiftcation of downstream signalling pathways such that agonist stimulation of the 
recombinant GPCR promotes growth. This yeast strain can be screened using biological 
extracts or compound libraries, or both, c: Yeast cells can be engineered to secrete small 
peptides from a random peptide library to identify autocrine surrogate, peptide agonists for 
recombinant orphan GPCRs. Modified from fief . 49. with permission. 



investigated an orphan opioid-like receptor, ORL1 (Refs 
47 and 48). Both groups expressed the orphan GPCR in 
CHO cells and challenged the transfected cells with a 
series of opiate agonists, but without response. Both 
groups then used a similar ligand fishing approach. 
Taking crude extracts from rat brain 47 or porcine brain 48 , 
they screened against the stably transfected cell lines 
using inhibition of adenylate cyclase activity as a func- 
tional assay. They were able to fractionate the brain 
extracts and identify the novel dynorphin-like ligand, 
which they called nociceptin 47 or orphanin FQ (Ret 48). 
Thus, bom teams successfully established a functional 
assay in transfected CHO cells that allowed the purifi- 
cation of a novel neuropeptide ligand that is 17 amino 
acids long for the orphan receptor. This work validates 
the ligand fishing approach for characterizing the func- 
tion of orphan GPCRs. 

Concluding remarks and future challenges 

Although orphan GPCRs have been around for over 
ten years, very few companies have, until recently, been 
willing to risk their resources to explore opportunities 
among this category of receptors. However, the environ- 
ment for the pharmaceutical industry has changed due to 
the confluence of several major technological advances. 
The conversion of gene sequences encoding GPCRs to 
drug targets is substantially aided by the development of 
combinatorial chemistry methods and miniaturized high- 
throughput screening techniques. The future challenge 
for drug discovery in this arena is to integrate these 
technologies innovatively and productively. One glimpse 
of the future comes from the field of functional genomics. 
The endogenous GPCR transduction system of the 
yeast, Saccharomyces cereuisiae, which is the pheromone 
pathway required for conjugation and mating, has been 
commandeered - through genetic engineering - to permit 
functional expression and coupling of human GPCRs and 



humanized G protein summits to the endogenous sig- 
nalling machinery* 9 - 61 (Fig. 4). Further manipulations 
involve conversion of the normal yeast response to 
pheromone or activating ligand (growth arrest) to positive 
growth on selective media or to reporter gene expression. 
In addition, yeast cells have been engineered to express 
and secrete small peptides from a random peptide library 
that will permit the autocrine activation of heterologously 
expressed human GPCRs (Refs 49 and 51). This provides 
an elegant means of screening rapidly for surrogate pep- 
tide agonists that activate orphan receptors. This yeast 
system is, of course, not limited to autocrine ligand screen- 
ing but can also be used in high-throughput mode to 
screen directly the fractions from biological extracts and 
the various chemical libraries as described above A major 
advantage of the yeast system over the mammalian 
heterologous expression systems is its ease of use and its 
lack of endogenous GPCRs, which can confound ligand 
fishing expeditions in mammalian cells. 

There is now tremendous pressure to be the first on 
the market with highly selective drugs that target thera- 
peutic areas of unmet medical need and ideally have 
novel mechanisms of action. As a consequence, the 
pharmaceutical industry has recognized the power of 
genomics to provide it with new and unique drug tar- 
gets. Genomics has responded with a plethora of novel 
proteins, included among them over 100 orphan GPCRs. 
Because of the proven link of GPCRs to a wide variety of 
diseases and the historical success of drugs that target 
GPCRs, we believe that these orphan receptors are 
among the best targets of the genomic era to advance 
into the drug discovery process. 
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CA 1 A 2 X-competitive 
inhibitors of 
farnesyltransferase as 
anti-cancer agents 

Charles A. Omer and Nancy £. Kohl 

For Ras oncoproteins to transform mammalian cells, 
they must be post-translationally farnesylated in a 
reaction catalysed by the enzyme farnesyl-protein 
transferase (FPTase). Inhibitors of FPTase have 
therefore been proposed as anti-cancer agents. In this 
review Charles Omer and Nancy Kohl discuss the 
development of FPTase inhibitors that are kinetically 
competitive with the protein substrate in the 
farnesylation reaction. These compounds are potent 
and selective inhibitors of the enzyme that block the 
tumourigenic phenotypes of ras-transformed cells and 
human tumour cells in cell culture and in animal 
models. 

Since the identification of farnesyl-protein transferase 
(FPTase) activity in mammalian cells, there has been an 
intense effort to develop inhibitors of this housekeeping 
enzyme for use as potential, novel anti-cancer agents 1 * 2 . 
This idea stems from the fact that several of the proteins 
that regulate mammalian cell proliferation require a 
post-translational modification catalysed by this enzyme 
for biological activity. Efforts over the past eight years 
have yielded potent, cell-active inhibitors of FPTase 
that demonstrate anti-proliferative activity in cell 
culture and in rodent models of cancer. 

The focus of the FPTase inhibitor (FIT) studies has 
been inhibition of the transforming activity of the Ras 



oncoproteins. Three ras genes, Ha-, N- and Ki-ras, encode 
four highly homologous, 21 kD proteins, Ha-, N-, Ki4A- 
and Ki4B-Ras (Ki4 A- and Ki4B-Ras are encoded by splice 
variants of the Ki-ros gene) 3 . Ras functions to regulate the 
transduction of extracellular growth-promoting signals 
from membrane-bound receptor tyrosine kinases to 
intracellular growth-regulatory pathways. Typical of the 
low-molecular-weight G proteins, Ras is active when 
bound to GTP and inactive when bound to GDP. Cycling 
from the active to the inactive form is accomplished by 
the intrinsic GTPase activity of the protein. Mutations in 
Ras that abolish the GTPase activity result in constitu- 
tively active forms of the protein. Such oncogenically 
mutated forms of Ras, particularly Ki4B-Ras, are found 
in approximately 30% of many human cancers including 
90% of pancreatic cancers and 50% of colon cancers 4 ^. 

Ras is synthesized as a biologically inactive, cytosolic 
protein that localizes to the inner surface of the plasma 
membrane where it acquires biological activity follow- 
ing a series of post-translational modifications (see Ref . 6 
for review). The first and obligatory step in this series is 
the transfer of a 15-carbon isoprenoid, farnesyl, from far- 
nesyl diphosphate (FPP) to the sulphur atom of the cys- 
teine residue located four amino adds from the carboxyl 
terminus of the protein. This cysteine residue is part of 
the CA X AjX motif found in all FPTase protein substrates, 
where C is cysteine, A t and A 2 are usually aliphatic 
amino acids and X is usually serine, methionine, gluta- 
mine, alanine or cysteine. Following farnesylation, 
A x A 2 X is proteolytically cleaved and the now C-terminal 
farnesyicysteine is methylated. In the case of all of the 
Ras proteins except Ki4B-Ras, palmitate groups are then 
added to cysteine residues upstream of the farnesylated 
cysteine. The demonstration that farnesylation is essen- 
tial for the transforming ability of the Ras oncopro- 
teins 7 - 10 has spurred the development of inhibitors of 
the enzyme that catalyses this reaction, FPTase, as anti- 
cancer agents. 

FPTase is a ubiquitously expressed, cytosolic enzyme 
comprised of two subunits, a 45 kDa a subunit and a 
48 kDa p subunit 6 . Cross-linking studies have shown 
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Structure and Functional Analysis of G Protein- 
Coupled Receptors and Potential 
Diagnostic Ligands 

Claire M. Fraser 

The Institute for Genomic Research, Gaithersburg, Maryland 



G protein-coupled receptors are a diverse class of proteins 
that mediate signal transduction across the plasma mem- 
brane. More than 200 receptors in this extended gene family 
have been cloned, and comparison of the deduced amino- 
acid sequences indicates that these proteins have marked 
homology and share a common membrane topology con- 
sisting of seven transmembrane helices. Although there is 
considerable variability in the physiologic ligands responsi- 
ble for receptor activation, all receptors in this group interact 
with trimeric, guanine nucleotide-binding proteins to initiate 
signaling cascades in the cell cytosol. To investigate the 
structural motifs responsible for ligand binding, we have es- 
tablished a model system to express heterologously human 
G protein-coupled receptors in a mammalian cell line. This 
experimental system allows each receptor subtype to be 
studied in isolation and provides a direct means to link re- 
ceptor activation to a particular second messenger cascade. 
Furthermore, the efficacy and specificity of new pharmaceu- 
ticals can now be evaluated readily with cloned human re- 
ceptors, eliminating the need for animal tissues. We have 
used this expression system in conjunction with an experi- 
mental strategy of site-directed mutagenesis to identify 
amino-acid residues that have a functional role in ligand 
binding. Because of the strong homology that exists within 
this family of receptor proteins, the results of this work are 
applicable to other systems and, therefore, can help to es- 
tablish a more complete understanding of ligand-receptor 
interactions. This combined molecular and biochemical ap- 
proach to the study of G protein-coupled receptors can pave 
the way for the development of isoform-specific ligands that 
may be used for radionuclide imaging and therapy. 

J Nucl Med 1995; 36(Suppl):17S-21S 
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^.✓ell surface receptors are integral membrane proteins 
that connect external stimuli to biochemical changes 
within the cell. These proteins can be grouped into three 
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superfamilies based on their primary structures and mech- 
anisms of action: 

1. Receptors that bind growth factors. 

2. Ligand-gated ion channels, such as the nicotinic, 
gamma-aminobutyric acid (GABA) and glycine re- 
ceptors. 

3. Receptors that interface with guanine nucleotide- 
binding regulatory proteins. 

The third group, G protein-coupled receptors, is a diverse 
collection of proteins that includes distinct receptor sub- 
families activated by peptide hormones, neurotransmit- 
ters, or environmental stimuli (Table 1). 

Although G protein-coupled receptors have different 
physiologic activators, they have two unifying character- 
istics: 

1 . Each protein contains seven stretches of high hydro- 
phobicity that appear to form membrane-spanning 
segments. Therefore, all receptors in this class are 
thought to share a similar membrane topology, anal- 
ogous to the structure of bacteriorhodopsin (Fig. 1). 
This proposed topology has been confirmed for both 
rhodopsin (/) and the beta-adrenergic receptor (2) 
through the use of antipeptide antibodies directed 
against specific regions of the receptor protein. 

2. In each system, receptor stimulation causes the acti- 
vation of a trimeric G protein on the cytosolic sur- 
face of the plasmalemma (3). Interaction with a G 
protein, therefore, is the common primary step of 
each signalling cascade. In the activated state, the 
G alpha subunit dissociates from the beta-gamma 
complex. Diversification of the biochemical re- 
sponse is caused by the subsequent modulation of 
additional effector enzymes by G alpha (Fig. 2). 
These downstream elements may include: phospho- 
lipases A, C, or D; adenylate or guanylate cyclase; 
or other proteins, such as ion channels. 

Pharmacologic analysis over the past 10 to 15 yr sug- 
gested that many receptor classes were, in fact, a group 
of closely related isoforms. This premise was supported 
by the observation that a specific ligand, such as acetyl- 
choline, could elicit distinct biochemical responses in dif- 
ferent tissues. Moreover, the sensitivity of receptors to 
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TABLE 1 

Membrane Receptors That Interact with G Proteins 



Peptide Hormone Receptors 

Angiotensin 

Adrenocorticotropin (ACTH) 

Antidiuretic hormone 

Bombesin 

Bradykinin 

Calcitonin 

Cholecystokinin (CCK) 
C5a anaphylatoxin 
Corticotropin-releasing hormone 

(CRF) 
Endothelin 
Gastrin 
Glucagon 

Glucagon-like peptide 
Gonadotropin-releasing hormone 

(GnRH) 
Growth hormone-releasing 

hormone (GRF) 
lnterleukin-8 

Kinins (bradykinin, substances P 

and K) 
Leutinizing hormone (LH) 
Melanocortin 

Melanocyte-stimulating hormone 

(MSH) 
N-formyl peptide 
Neuropeptide tyrosine (NPY) 
Neurotensin 
Opiates 
Oxytocin 

Parathyroid hormone 
Pituitary adenylate cyclase- 

activating protein 
Secretin 
Somatostatin 

Thyrotropin-releasing hormone 
(TRH) 

Vasoactive intestinal polypeptide 
(VIP) 

Vasopressin 



Glycoprotein Hormone 
Receptors 

Choriogonadotropin 
Follicle-stimulating hormone 

(FSH) 
Thyrotropin (TSH) 

Neurotransmitter Receptors 

Adenosine 

Adenosine triphosphate (ATP) 
Alpha-Adrenergic 
Beta-Adrenergic 
Dopamine 

Gamma-aminobutyric acid 

(GABA) 
Glutamate 
Histamine 

Muscarinic acetylcholine 

Octopamine 

Serotonin 

Tyramine 

Sensory Systems 

Vision (rhodopsins) 
Olfaction 

Other Agents 

Cannabinoids 
Immunoglobulin E (IgE) 
Mas oncogene 
Platelet-activating factor 
Prostanoids 
Thrombin 



agonists or antagonists varied with the experimental mate- 
rial. These early observations have been confirmed with 
the cloning of over 200 genes that encode G protein- 
coupled receptors (4). Comparison of the predicted pro- 
tein sequences illustrated that most receptors are part of 
a multigene family that may include as many as six iso- 
forms (4). In addition, low-stringency screening and the 
application of new molecular cloning techniques have led 
to the identification of novel receptor subtypes that were 
not previously anticipated from pharmacologic studies. 

These findings highlight one of the most challenging 
problems in the development of useful drugs for radionu- 
clide imaging and therapy: How can pharmaceuticals be 
designed and tested that are specific for a particular recep- 
tor isoform? To address this problem adequately, it is 
essential to answer the following questions: Which second 
messenger cascade is elicited by a particular receptor sub- 
type? How is the response affected by different agonists? 
It is equally apparent, from the heterogeneity of receptor 



proteins in vivo, that the answers to these questions wijjL 
require the development of new experimental systems that * 
can ascertain the properties of each receptor subtype. 

HETEROLOGOUS EXPRESSION 

We have used an experimental system in which cloned 
G protein-coupled receptors are stably transfected and 
expressed in a mammalian cell line (5). Heterologous 
expression of receptor proteins has two major advantages: 
analysis of a single receptor subtype in isolation and study 
of drug interactions with human receptors, eliminating 
the need for animal tissues in drug screening protocols. 
Although our research has focused on the muscarinic ace- 
tylcholine receptor, the observations concerning receptor- 
ligand interaction are relevant to any one of a number 
of G protein-coupled receptors. Hence, within this gene 
superfamily, there exists some commonality of structure 
and function. 

Five distinct muscarinic receptor genes have been 
cloned and sequenced (6) and have been designated ml 
through m5. The ml, m3 and m5 subtypes preferen- 
tially stimulate phosphoinositide hydrolysis in response 




HUMAN BETA 2 » ADRENERGIC RECEPTOR 



FIGURE 1. Schematic illustration of cell membrane topology 
of G protein-coupled receptors with seven stretches of mem- 
brane-spanning segments with high hydrophobicity. [Reprinted 
with permission from: Lee NH t Fraser CM. Identifying the func- 
tional domains of G protein-coupled-receptors. In: Krogsgaard- 
Larsen P, Christensen S, Kofod H, eds. News leads and targets 
in drug research: Alfred Benzon symposium no. 33. Copenha- 
gen: Munksgaard; 1992:187-199.) 
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Acetylcholine Norepinephrine 



Norepinephrine 




Arachidonic 
Acid 



FIGURE 2. Schematic illustration of the signal transduction mechanisms common to G protein-coupled receptors. Agonist 
binding to G protein-coupled receptors promotes receptor coupling to various heterotrimeric G proteins, which catalyze the 
exchange of bound guanosine diphosphate (GDP) for guanosine triphosphate (GTP) on the G protein alpha subunits. Binding of 
GTP to the alpha subunits results in dissociation of the G protein heterotrimer complex. Depending on the receptor and the G 
protein with which it is associated, the Ga-GTP subunit activates (+) or inhibits (-) one or more intracellular effector enzymes, 
leading to metabolic changes in the cell. Acetylcholine binds to subtypes of muscarinic acetylcholine receptors indicated as M1 
and M2. Norepinephrine and epinephrine bind to subtypes of alpha- (a) and beta- ifi) adrenergic receptors. The heterotrimeric G 
proteins are composed of alpha, beta and gamma subunits. Effector enzymes stimulated or inhibited by G protein-coupled 
receptors include: (a) adenlyate cyclase (AC), which converts adenosine triphosphate (ATP) to cyclic adenosine monophosphate 
(cAMP) and activates protein kinase A (PKA); (b) phospholipse C (PLC), which hydrolyzes inositol phospholipids to produce 
inositol phosphates (IP3) and diacylglycerot (DG) [Inositol phosphates increase the levels of intracellular calcium and diacyglycerol 
stimulates protein kinase C (PKC) activity]; (c) phospholipase A2, which hydrolyzes membrane lipids to produce arachidonic acid; 
and (d) various ion channels, which modulate ion flow across the cell membrane. 
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to agonist binding, whereas the m2 and m4 subtypes 
preferentially inhibit adenylate cyclase (6). Each recep- 
tor however, can activate more than one intracellular 
signaling pathway under the appropriate conditions. 
For example, the phosphoinositide-coupled muscarinic 
receptors have been shown to mediate an increase in 
intracellular cyclic adenosine monophosphate (cAMP) 
and also may stimulate the release of arachidonic acid 
from membranes (7,8). 

Equally interesting are the differences observed in the 
magnitude of the responses elicited by receptors that 
activate the same second messenger cascade (7). The ml 
and m3 isoforms both stimulate the phosphoinositide 
pathway. Yet, comparison of the ml and m3 subtypes, 
expressed in Chinese hamster ovary (CHO) cells, illus- 
trated that the phosphoinositide response evoked by ago- 
nist binding to the ml muscarinic receptor was always 
greater than that observed with the m3 receptor. These 
differences were not due to dissimilarities in the level of 
gene expression since both receptors were present at the 
plasma membrane in equivalent densities. It is not clear 
whether this difference reflects the coupling of these two 
receptor subtypes to distinct G proteins or a differential 
coupling to a single G protein. Nevertheless, these obser- 
vations suggest that there may be physiologically relevant 
differences in the coupling of receptor isoforms to the 
same biochemical pathway. 



We have also observed agonist-specific activation of 
intracellular signalling pathways (9). Three muscarinic 
agonists — carbachol, pilocarpine, and AF102B — were 
examined for their ability to stimulate phosphoinositide 
hydrolysis, cAMP production and arachidonic acid re- 
lease from CHO cells transfected with the ml muscarinic 
receptor. Carbachol and pilocarpine produced maximal 
stimulation of phosphoinositide hydrolysis. This response 
was greater than the phosphoinositide hydrolysis elicited 
by AF102B. Similar results were found when arachidonic 
acid release was monitored. In contrast, only carbachol 
produced an increase in the level of cytosolic cAMP, 
whereas pilocarpine and AF102B had no effect on this 
pathway. These data support findings from other studies 
with ml muscarinic receptors (JO). 

Comparison of the chemical structure of these ago- 
nists suggests one plausible explanation for the diverse 
response of the ml receptor: Carbachol is the com- 
pound with the most flexibility since it can assume four 
or five conformational states that have a similar energy 
minima (9). Multiple conformational states may allow 
distinct ligand-receptor interaction, which might ac- 
count for the diversity observed in the biochemical re- 
sponse. Pilocarpine and AF102B, on the other hand, 
have more rigid chemical structures, which may limit 
the ability of these compounds to stimulate completely 
the ml receptor. Interestingly, it has been postulated 
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that the ability of AF102B to function as a partial ago- 
nist may be important therapeutically because patients 
may not develop tolerance to this compound (9). This 
drug is currently in clinical trials in Japan as a treatment 
for Alzheimer's disease. 

DOWNREGULATION 

The phenomenon of receptor downregulation and its 
relation to drug tolerance is a serious problem in the 
development of therapeutics. It has been shown that long- 
term incubation of many G protein-coupled receptors with 
an agonist produces a reduction in the number of receptor 
proteins at the cell surface (4). This process is called 
receptor downregulation. One clinical manifestation of 
this phenomenon is tachyphylaxis, observed in asthma 
patients who use beta-adrenergic agonists, such as 
bronchial dilators. Chronic administration of these beta- 
adrenergic agonists may cause patients eventually to be- 
come refractory to the agonist's beneficial effects. 

We have examined the biochemical and molecular fea- 
tures of receptor downregulation in CHO cells transfected 
with the ml muscarinic acetylcholine receptor, the a 2 - 
adrenergic receptor, and the /^-adrenergic receptor (//). 
Following 24-hr incubation with carbachol, a muscarinic 
agonist, transfected CHO cells showed a reduction in the 
magnitude of phosphoinositide hydrolysis elicited by re- 
application of carbachol in comparison to cells that had 
no prior carbachol exposure. Interestingly, the addition 
of isoproterenol, a beta-adrenergic agonist, also caused a 
reduction in the carbachol -induced phosphoinositide re- 
sponse of the muscarinic receptor. Therefore, long-term 
activation of either G protein-coupled receptor (the mus- 
carinic or the beta-adrenergic receptor) caused downregu- 
lation of the muscarinic receptor. The reduction in recep- 
tor density at the cell surface correlated with a decrease 
in the level of messenger ribonucleic acid (mRNA) spe- 
cific for the muscarinic receptor. 

These observations indicate that receptor downregula- 
tion is in part a biochemical feedback process that reduces 
the level of gene transcription in response to receptor 
stimulation. These findings may be important in the long- 
term therapy of diseases with some of these agonists and 
represent a possible utility for radionuclide imaging as a 
technique to monitor changes in receptor levels in target 
tissues. 

SITE-DIRECTED MUTAGENESIS 

Along with the biochemical analysis of G protein-cou- 
pled receptors, we have used this heterologous expression 
system to identify structures within receptor protein that 
have functional importance (4). These studies employed 
an experimental strategy of site-directed mutagenesis fol- 
lowed by expression of the mutant receptor protein in 
transfected cells to define regions responsible for ligand 
binding and receptor activation by agonists. A similar 



strategy has been utilized in other laboratories to deter^ 
mine receptor domains that interact with G proteins (4) 
and amino acid residues that undergo post-translationa] 
modifications, such as glycosylation, which may be esse*, 
tial for normal receptor function (4). 

We have focused on amino acid residues that are highly 
conserved among all G protein-coupled receptors and po. 
sitioned toward the extracellular membrane surface when: 
ligand-binding is thought to occur. One caveat to this 
experimental approach is the possibility that a point muta- 
tion will cause a large-scale conformational change in the 
protein. In such a case, receptor inactivity may be caused 
by protein misfolding and because the mutated residue 
had a critical role in receptor function. To minimize this 
problem, we have made conservative amino acid substitu- 
tions, replacing the original residue with one of similar 
size and/or hydrophobicity. 

One of the striking features of most receptors in this 
family is the presence of two conserved cysteine residues 
(4), one in the extracellular loop between helices II and 

III and a second in the extracellular loop between helices 

IV and V (Fig. 1). Biochemical evidence from a number 
of G protein-coupled receptor systems has suggested that 
these cysteines may form a disulfide linkage, covalently 
connecting the two extracellular loops (4,12). We have 
made mutations at each position, changing the cysteine 
to a serine residue, in the muscarinic acetylcholine recep- 
tor. In each case, the transfected cells expressed the mu- 
tant receptor, as evidenced by Northern analysis (13), but 
no agonist-mediated increase in phosphoinositide hydro- 
lysis could be observed. 

These results confirm the earlier biochemical studies 
and also suggest that disulfide formation is essential for 
maintaining the correct protein conformation required for 
recognizing ligands and receptor activation. 

The precise location of the ligand binding-site has yet 
to be determined. Earlier work implied that ligands were 
bound within the transmembrane domains, since large 
deletions in the beta-adrenergic receptor could be made 
in either the extracellular or cytosolic loops without af- 
fecting ligand-receptor association (14). In light of these 
findings, we began to look at these domains and specific 
amino acids within the transmembrane helices, asking 
whether these residues had a role in ligand binding. Align- 
ment of the deduced amino acid sequences from a number 
of G protein-coupled receptors revealed that a single 
aspartic acid residue within helix III is absolutely con- 
served among all receptors that bind ligands with a posi- 
tively charged nitrogen. Examples include the following 
proteins: muscarinic receptors that bind acetylcholine, ad- 
renergic receptors that bind epinephrine and norepineph- 
rine, dopamine receptors, serotonin receptors, and hista- 
mine receptors. Moreover, it has been postulated that this 
negatively charged aspartic acid may play a role in bind- 
ing the positively charged nitrogen common among these 
ligands (15), 



t 



m 

re* 
th 

\h 

af 

U 
et 
re 
to 
m 

Ci 

id 
w 
tc 
ai 
oi 
si 

Cl 



a; 
fi 

u 
t( 

a 

F 



20S 



The Journal of Nuclear Medicine • Vol. 36 • No. 6 (Suppl) • June 1995 



leter- 

* «) 
ional 
<sen- 

ghly 
'po- 
here 
this 
mta- 
i the 
ised 
i due 
this 
titu- 
ular 

this 
; ues 
and 
ces 
ber 
hat 
itly 
ave 
ine 

-P- 
iu- 

5Ut 

ro- 



We have examined the role of this aspartic acid by 
mutating it to an asparagine in beta- and alpha-adrenergic 
jeceptors and in the muscarinic acetylcholine receptor. All 
three mutant receptors were unable to bind radiolabeled 
Ligands, whereas the wildtype proteins displayed a high- 
affinity, saturable binding of the appropriate compound 
(16). Our findings corroborate results published by Hulme 
et al. (^7), which determined that this same aspartic acid 
residue in the muscarinic receptor was covalently linked 
to the radioactive affinity-probe, propylbenzilylcholine 
mustard. 

Work from our laboratory and others have also impli- 
cated transmembrane threonine, tyrosine and cysteine res- 
idues in agonist binding, although it is not yet known 
whether any of these residues directly participate in recep- 
tor-ligand interactions (75,79). All of these amino acids 
are located in the same plane of the membrane, within 
one to two turns of the alpha helix from the extracellular 
surface, supporting the idea that agonist binding may oc- 
cur within the upper third of the transmembrane helices. 

CONCLUSION 

A combined approach of heterologous gene expression 
and site-directed mutagenesis provides a starting point for 
future structure-function analysis of G protein-coupled 
receptors. These studies, along with efforts toward ob- 
taining a receptor crystal structure, may make it possible 
to design more selective ligands for radionuclide imaging 
and therapy. 
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Summary 

Bile acids repress the transcription of cytochrome 
P450 7A1 (CYP7A1), which catalyzes the rate-limiting 
step in bile acid biosynthesis. Although bile acids acti- 
vate the farnesoid X receptor (FXR), the mechanism 
underlying bile acid-mediated repression of CYP7A1 
remained unclear. We have used a potent, nonsteroi- 
dal FXR ligand to show that FXR induces expression 
of small heterodimer partner 1 (SHP-1), an atypical 
member of the nuclear receptor family that lacks a 
DNA-binding domain. SHP-1 represses expression of 
CYP7A1 by inhibiting the activity of liver receptor ho- 
molog 1 (LRH-1), an orphan nuclear receptor that is 
known to regulate CYP7A1 expression positively. This 
bile acid-activated regulatory cascade provides a 
molecular basis for the coordinate suppression of 
CYP7A1 and other genes involved in bile acid biosyn- 
thesis. 

Introduction 

Cholesterol is essential for a number of cellular func- 
tions, including membrane biogenesis and steroid hor- 
mone and bile acid biosynthesis. However, in excess, 
cholesterol can contribute to disease processes such 
as atherosclerosis and gallstone formation. Therefore, 
cholesterol biosynthesis and catabolism must be coor- 
dinately regulated. The metabolism of cholesterol to bile 
acids represents a major pathway for its elimination 
from the body, accounting for approximately half of daily 
excretion. Cytochrome P450 7A (CYP7A1 ) is a liver-spe- 
cific enzyme that catalyzes the first and rate-limiting 
step in one of the two pathways for bile acid biosynthesis 
(Chiang, 1998; Russell and Setchell, 1992). The gene 
encoding CYP7A1 is regulated by a variety of small, 
lipophilic molecules, including steroid and thyroid hor- 
mones, cholesterol, and bile acids. Notably, CYP7A1 
expression is stimulated by cholesterol feeding and re- 
pressed by bile acids. Thus, CYP7A1 is under both feed- 
forward and feedback regulation. 
CYP7A1 expression is regulated by several members 
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of the nuclear receptor superfamily of ligand-activated 
transcription factors (Chiang, 1998; Gustafsson, 1999; 
Russell, 1 999). Recently, two nuclear receptors, the liver 
X receptor a (LXRa; NR1H3) (Apfel et al., 1994; Willy et 
al., 1995) and farnesoid X receptor (FXR; NR1H4) 
(Forman et al., 1995; Seol et al., 1995), were implicated 
in the feedforward and feedback regulation of CYP7A1, 
respectively (Peet et al M 1 998; Russell, 1 999). Both LXRa 
and FXR are abundantly expressed in the liver and 
bind to their cognate hormone response elements as 
heterodimers with the 9-c/s retinoic acid receptor 
RXR (Mangelsdorf and Evans, 1 995). LXRa is activated 
by the cholesterol derivative 24,25(S)-epoxycholesterol 
and binds to a response element in the CYP7A1 pro- 
moter (Lehmann et al., 1997). Mice lacking LXRa do not 
induce CYP7A1 expression in response to cholesterol 
feeding (Peet et al., 1 998). Moreover, these animals ac- 
cumulate massive amounts of cholesterol in their livers 
when fed a high cholesterol diet. These data establish 
LXRa as the cholesterol sensor responsible for feedfor- 
ward regulation of CYP7A1 expression. 

Bile acids stimulate the expression of genes involved 
in bile acid transport, such as the intestinal bile acid- 
binding protein (t-BABP), and repress CYP7A1 and other 
genes encoding enzymes involved in bile acid biosyn- 
thesis, such as CYP8B1 , which converts chenodeoxy- 
cholic acid (CDCA) to cholic acid, and CYP27, which 
catalyzes the first step in the alternative, "acidic" path- 
way for bile acid synthesis (Russell and Setchell, 1992; 
Javitt, 1994; Russell, 1999). Recently, FXR was shown 
to be a bile acid receptor (Wang et al., 1996; Makishima 
et al., 1999; Parks et al., 1999). Several different bile 
acids, including CDCA and its glycine and taurine conju- 
gates, bind and activate FXR at physiologic concentra- 
tions. Moreover, FXR response elements (FXREs) were 
identified in both the mouse and human f-BABP promot- 
ers (Grober et al., 1999; Makishima et al., 1999), which 
provided strong evidence that FXR mediates the posi- 
tive effects of bile acids on l-BABP expression. Notably, 
the rank order of bile acids that activate FXR correlates 
with that for repression of CYP7A1 in a hepatocyte- 
derived cell line (Makishima et al. t 1999). These data 
suggested that FXR also has a role in the negative ef- 
fects of bile acids on gene expression. However, since 
the region of the CYP7A1 promoter that is necessary 
for bite acid-mediated repression lacks a strong FXR- 
binding site (Chiang and Stroup, 1994; Chiang et al., 
2000), it seemed unlikely that this repression was a di- 
rect effect of FXR. Thus, the molecular mechanism for 
bile acid-mediated repression of CYP7A1 remained 
in question. 

In this report, we have used a potent, nonsteroidal FXR 
ligand to demonstrate that FXR regulates the hepatic 
expression of small heterodimer partner 1 (SHP-1; 
NR0B2), an atypical, orphan member of the nuclear re- 
ceptor family that lacks a DNA-binding domain (Seol et 
al., 1996). SHP-1 has been shown to bind to other nu- 
clear receptors and to repress their transcriptional activ- 
ities ( Seol et al., 1996; Masuda et al., 1997; Johansson 
et al., 1999; Lee et aL, 2000). We show that SHP-1 re- 
presses the CYP7A1 promoter through Interaction with 
liver receptor homolog 1 (LRH-1; NR5A2), an orphan 
nuclear receptor that binds as a monomer to a response 
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element in the CYP7A1 promoter and activates tran- 
scription (Becker-Andre et al., 1993; Galameau et al., 
1996; Nitta et al., 1999). LRH-1 Is a mammalian homolog 
of the Dmsophila fushi tarazu F1 gene product, which 
regulates Dmsophila metamorphosis (Lavorgna et al., 
1991; Broadus et al., 1999). Our findings define a novel 
regulatory cascade of three orphan nuclear receptors 
that provides a molecular basis for the coordinate re- 
pression of gene expression by bile acids. 

Results 

Identification of GW4064 as a Potent, 
Selective FXR Activator 

FXR was recently shown to be a receptor for CDCA as 
well as other bile acids (Makishima et al., 1 999; Parks et 
al., 1 999; Wang et al., 1 999). However, these compounds 
bind to FXR with only micromoiar affinities and at these 
concentrations also interact with other proteins, includ- 
ing bile acid-binding proteins and transporters. We 
sought to identify a potent, selective FXR ligand for use 
as a chemical tool in elucidating the genes regulated, 
by FXR. Combinatorial libraries of compounds were 
screened using a ligand-sensing fluorescence reso- 
nance energy transfer assay that detects interactions 
between FXR and a peptide derived from the steroid 
receptor coactivator 1 (SRC-1) as previously described 
(Parks et al., 1999). Among the compounds that pro- 
moted an interaction between FXR and SRC-1 was the 
isoxazole GW4064 (Figure 1 A), which bound to FXR with 
a half-maximal effective concentration (ECso) of 15 nM 
(Maloney et al., 2000). GW4064 activated mouse and 
human FXR with EC* values of 80 and 90 nM, respec- 
tively, in CV-1 cells transfected with FXR expression 
vectors and a reporter plasmid containing two copies 
of an established FXR response element (FXRE) derived 
from the Drosophila heat shock protein 27 (hsp27) pro- 
moter (Forman et al., 1995) (Figure 1B). Thus, GW4064 
is ~1 000-fold more potent than CDCA in activating FXR 
in CV-1 cells (Figure 1 B). 

GW4064 was tested for selectivity against a panel 
of nuclear receptors. CV-1 ceils were transfected with 
expression plasmids for various nuclear receptor-GAL4 
chimeras and the reporter plasmid (CMS)s-tk-CAT as 
previously described (Parks et al. f 1999). GW4064 acti- 
vated only the FXR-GAL4 chimera (Figure 1C). Thus, 
GW4064 is a highly selective activator of FXR. 

FXR Regulates SHP-1 Expression in the Liver 
GW4064 was exploited as a chemical tool to identify 
the genes regulated by FXR in the liver. Male Fisher rats 
were treated for 7 days with GW4064 or vehicle alone 
(methyl cellulose). Following treatment, RNA was pre- 
pared from the livers of GW4064- and vehicle-treated 
animals, and genes that were either induced or re- 
pressed by GW4064 treatment were determined using 
CuraGen GeneCalling~ differential gene expression 
technology (Shimkets et al., 1 999). A comprehensive list 
of the liver genes regulated by GW4064 will be published 
elsewhere. Interestingly, the gene that was most strongly 
induced by GW4064 treatment was that encoding the 
orphan nuclear receptor SHP-1 . Northern analysis showed 
that SHP-1 expression was increased ~6-fold in the 
livers of GW4064- treated rats relative to vehicle-treated 
animals (Figure 2A). 
Bile acids are known to repress the expression of 
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Figure 1 . GW4064 ts a Potent, Selective Activator of FXR 

(A) Chemical structure of GW4064. 

(B) CV-1 cells were transfected with expression plasmids for human 
or mouse FXR and the (hsp70EcRE) 2 -tk-LUC reporter plasmid con- 
taining two copies of the hsp70 ecdysone response element up- 
stream of the thymidine kinase (tk) promoter and lucrferase gene. 
Transfected cells were treated with the indicated concentrations of 
either GW4064 or CDCA, Open circles, mouse FXR and GW4064; 
open triangles, human FXR and GW4064; closed circles, mouse FXR 
and CDCA; closed triangles, human FXR and CDCA. Data points 
represent the mean of assays performed in triplicate. 

(C) CV-1 cells were transfected with expression vectors for various 
GAL4-nuctear receptor ligand -binding domain chimeras and the 
reporter plasmid (UAS) r tk-CAT. Transfected cells were treated with 
1 jiM GW4064. Data represent the mean of assays performed in 
triplicate ± S.D. 

CYP7A1 as part of a regulatory feedback loop that con- 
trols the rate of their biosynthesis from cholesterol 
(Russell and Setchell, 1992; Russell, 1999). Two recent 
studies implicate FXR in the repression of CYP7A1 
(Makishima et al., 1 999; Wang et al., 1 999), although the 
molecular mechanisms have remained unclear since the 
CYP7A1 promoter does not contain a consensus FXRE 
(Chiang et at., 2000). In parallel with our analysis of 
SHP-1 expression, we examined whether GW4064 treat- 
ment resulted in decreased CYP7A1 expression in male 
Fisher rats. Rats treated with GW4064 showed a sub- 
stantial decrease In CYP7A1 mRNA levels (~4-fold, Fig- 
ure 2A). Thus, GW4064 mimics the well documented 
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Figure 2. FXR Ligands Induce SHP-1 and Repress CYP7A1 Ex- 
pression 

(A) Total RN A was prepared from the livers of male Fisher rats treated 
for 7 days with GW4064 or vehicle alone. Northern analysis was 
performed using probes for rat SHP-1 and CVPM f . Oata represent 
the mean (n = 3) ± standard error of the means. The asterisk denotes 
a statistically significant difference between vehicle- and GW4064- 
treated animals; P < 0.05. 

(B) Total RNA was prepared from primary rat or human hepatocytes 
treated f or 48 hr with the indicated concentrations of GW4064 or 
vehicle alone. Northern -analysis was performed using probes for 
rat or human SHP-1, CYP7A1, or p-actin. 

(C) Total RNA was prepared from primary human hepatocytes 
treated for 48 hr with the indicated concentrations of COCA. North- 
em analysis was performed using probes for human SHP-1, 
CYP7A1, or p-actin. 



effects of naturally occurring FXR ligands, namely bile 
acids, on CYP7A1 expression. This observation pro- 
vides compelling evidence that FXR mediates feedback 
repression of CYP7A1 by bile acids. 

To substantiate the in vivo data and extend them to 
human hepatocytes, we examined whether SHP-1 and 
CYP7A1 expression were regulated by FXR in primary 
cultures of rat and human hepatocytes. Hepatocytes 
were treated with increasing concentrations of GW40G4, 
and the levels of SHP-1 and CYP7A1 expression were 
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Figure 3. Identification of FXR Binding Sites in the Human, Rat, and 
Mouse SHP~ 1 Promoters 

(A) Alignment of the proximal regions of the human, rat, and mouse 
SHP-1 promoters. The conserved IR1 FXR binding site is boxed. 
Conserved nucleotides are indicated by asterisks. 

(B) Electrophoretic mobility-shift assays were performed with in vitro 
synthesized human FXR and/or human RXRa as indicated and re- 
labeled oligonucleotides containing the IR1 motif from the rat, 
mouse, or human SHP-1 promoters or the mouse or human l-BABP 
promoters. The positions of the shifted FXR/RXRo complex and free 
probes are indicated. 

(C) Electrophoretic mobility-shift assays were performed with in vitro 
synthesized human FXR and/or human RXRa, a pP] -labeled oligo- 
nucleotide containing the human l-BABP FXRE, and either a 5-, 25-, 
or 75-foW excess of unlabeled oligonucleotides containing the IR1 
motifs from the human l-BABP promoter, the mouse, rat or human 
SHP-1 promoters, or a mutated derivative of the mouse SHP-1 IR1 
motif (mSHPmut). The position of the shifted FXR/RXRa complex 
is Indicated. 



examined by Northern blot analysis. GW4064 treatment 
markedly increased SHP-1 expression and decreased 
CYP7A1 expression in hepatocytes from both species 
in a dose-dependent fashion (Figure 2B). Similar results 
were obtained in human hepatocytes treated with the 
natural FXR ligand COCA (Figure 2C). As expected, 
CDCA was less potent than GW4064 in its effects on 
gene expression (compare Figures 23 and 2C). These 
data strongly suggest that FXR regulates SHP-1 and 
CYP7A1 expression In both human and rodent hepato- 
cytes. Notably, there was a striking reciprocal relation- 
ship between the regulation of SHP-1 and CYP7A1 
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Figure 4. FXR Activates the Rat and Human 
SHP-1 Promoters 

HepG2 cells were transfected with the human 
FXR expression plasmid and kjcrt erase re- 
porter plasmids containing the proximal pro- 
moters of the rat ([A], nucleotides -441 to 
+1 9) or human ffB], nucleotides -572 to + 1 0) 
SHP-1 genes or the corresponding reporter 
plasmids in which the IR1 elements had been 
mutated (AIR1). Following transfection, cells 
were treated for 48 hr with GW4064 (1 »iM) 
or CDCA (100 jiM). Data represent the 
mean ± S.D. of six individual transfections. 
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expression: GW4064 and CDCA repressed CYP7A1 ex- 
pression at the same concentrations that were required 
to induce SHP-1 expression (Figures 2B and 2C). Since 
SHP-1 is known to heterodimerize with several other 
members of the nuclear receptor superfamily and to 
repress their transcriptional activity ( Seol et al., 1 996; 
Masuda et al., 1997; Johansson et al., 1999), these data 
raised the intriguing possibility that FXR-mediated in- 
duction of SHP-1 might underlie the repression of 
CYP7A1 expression (see below). 

FXR Binds and Activates SHP-1 Promoters 
We next sought to determine whether SHP- 1 expression 
is directly regulated by FXR. FXR preferentially binds as 
a heterodimer with RXR to FXREs composed of two 
nuclear receptor half-sites of consensus AG(G/T)TCA 
organized as an inverted repeat and separated by a 
single nucleotide (IR1) (Forman et al., 1995). IR1-type 
FXREs have been identified in the human and mouse 
l-BABP promoters (Grober et al., 1999; Makishima et 
al., 1999). The mouse, rat, and human SHP-1 promoters 
were examined for IR1 motifs. A highly conserved IR1- 
like element was identified ~300 nucleotides upstream 
of the transcription initiation site in the SHP-1 promoter 
of all three species (Figure 3A). Electrophoretic mobility- 
shift analyses demonstrated that the FXR/RXR complex 
binds efficiently to the IR1 element from the SHP-1 pro- 
moter of each species (Figure 3B). In agreement with 
earlier observations (Grober et al. t 1999), the FXR/RXR 
heterodimer also bound to the mouse and human 
l-BABP FXREs (Figure 3B). Competition binding analy- 
ses showed that these interactions were specific: no 
competition was seen with a mutated derivative of the 
IR1 motif derived from the mouse SHP-1 promoter (Fig- 
ure 3C). 

The presence of an FXR/RXR binding site suggested 
that the SHP-1 gene is directly regulated by FXR. To 
test this hypothesis, HepG2 cells were transfected with 
an FXR expression plasmid and reporter plasmids ex- 
pressing luciferase under the control of either the rat or 



human SHP-1 promoters. GW4064 treatment of cells 
transfected with the FXR expression plasmid and either 
promoter construct resulted in a marked induction of 
reporter activity (Figures 4A and 4B). Based on Northern 
blot analysis of SHP-1 expression (Figure 2B), the mag- 
nitude of the response from the rat (7-fold) and human 
(3-fold) SHP-1 promoters was somewhat lower than ex- 
pected and it is possible that other promoter or enhancer 
elements contribute to the regulation of SHP-1 expres- 
sion. Alternately, additional factors present in well differ- 
entiated cultures of rat hepatocytes but not HepG2 cells 
may be required for maximal FXR responsiveness. In 
the absence of exogenously expressed FXR, the rat and 
human SHP-1 promoters exhibited a modest (~1 .5-fold) 
induction on exposure to GW4064, which is most likely 
due to endogenous FXR in HepG2 cells (data not shown). 
FXR responsiveness was eliminated when mutations 
were introduced into the IR1 motifs in either the rat or 
human SHP-1 promoters (Figures 4A and 4B). These 
data provide strong evidence that SHP-1 expression 
is regulated directly by the FXR/RXR heterodimer in 
multiple species. 

SHP-1 Interacts with Orphan Nuclear 
Receptor LRH-1 

The finding that SHP-1 expression is regulated by FXR 
together with the reciprocal relationship between SHP-1 
and CYP7A1 regulation (Figure 2) suggested that SHP-1 
might play a pivotal role in bile acid-mediated repression 
of CYP7A1 expression. Regulation of the CYP7A1 pro- 
moter Is complex and involves numerous transcription 
factors, including nuclear receptors with known ligands 
such as the thyroid hormone receptor (TR), retinoic acid 
receptor (RAR), RXR and LXRa, and the orphan recep- 
tors COUP-TFII, HNF4«, and LRH-1 (Lehmann et al., 
1 997; Stroup et al., 1 997; Chiang, 1 998; Peet et al., 1 998; 
Nitta et al., 1999; Russell, 1999; Stroup and Chiang, 
2000). SHP-1 has previously been show to bind to and 
repress the transcriptional activities of TR, RAR, and 
RXR in the presence of their ligands and HNF4a In the 



A Regulatory Cascade of Nuclear Receptors 
621 




z 4 f 




B 




Figure 5. SHP-1 Interacts with the Orphan Nuclear Receptor LRH-1 

(A) Mammalian two-hybrid experiments were performed in CV-1 
cells cotransfected with expression plasmids for the GAL4-human 
SHP-1 chimera and various VP16~nuclear receptor lig and -binding 
domain chimeras. Transfection assays containing the LXRa-, FXR-, 
RARa-, TRp-, ERa-, and RXRa-GAL4 chimeras were performed in 
the absence or presence of the indicated ligands [respectively: EPC, 
24(S),25-epoxycholesterol (10 ^M), GW4064 (1 jaM); RA, a\\-trans 
retinoic acid (0.1 nM); T 3 , triiodothyronine (0.1 jjlM); E 2 , estradiol (0.1 
hM); 9-c/s RA, 9-c/s retinoic acid (0.1 jiM)]. Data are expressed as 
fold activation over cells transfected with the (CMS) 5 -tk-CAT reporter 
alone and represent the mean of assays (n = 8) ± S.D. 

(B) GST pull-down assays were performed with f^Sj -labeled LRH-1 
or RXRa in the presence of GST or GST-SHP-1 as indicated. 9-c/s 
retinoic acid (9-c/s RA) was added to the binding reaction to a final 
concentration of 10 jtM. 



absence of any exogenous ligand (Seo! et al., 1996; 
Masuda et al., 1997). Using a mammalian two-hybrid 
approach, we examined whether SHP-1 interacts with 
these and other nuclear receptors that have been impli- 
cated in the regulation of CYP7A1. CV-1 cells were trans- 
fected with an expression plasmid for a GAL4-SHP-1 
chimera, the (t/AS) 5 -tk-CAT reporter, and expression 
plasmids for chimeras between the strong transcrip- 
tional activation domain of VP1 6 and the isolated ligand- 
binding domains of a panel of nuclear receptors (Figure 
5A). When transfected alone, the GAL4-SHP-1 chimera 
caused a minor reduction (~0.3-fold) in reporter activity 
(Figure 5A). However, reporter activity was strongly in- 
duced when GAL4-SHP-1 was coexpressed with VP16- 
RXRa (~44-fold) or VP1 6-estrogen receptor a (ERa, 
~1 1 -fold) in the presence of 9-c/s retinoic acid and estra- 
diol, respectively (Figure 5A). These interactions were 
strongly dependent on the presence of ligand. Little or 
no interaction was detected between SHP-1 and LXRa, 



FXR, COUP-TFII, HNF4a, RARa, or TRp In our mamma- 
lian two-hybrid assay (Figure 5A). The lack of a stronger 
interaction between SHP-1 and either TRp, RARa, or 
HNF4a was surprising in light of the previous results of 
others (Seol et al., 1996; Masuda et al. f 1997) and may 
reflect differences in the assay systems used. Notably, 
strong reporter activity was detected when GAL4-SHP-1 
was expressed with VP1 6-human LRH-1 or VP1 6-mouse 
LRH-1 (~1 4-fold activation for both human and mouse). 
This activity was completely dependent on the presence 
of GAL4-SHP-1 (data not shown). These data demon- 
strate that SHP-1 can interact with LRH-1 in cells. Inter- 
estingly, little or no interaction was detected between 
SHP-1 and steroidogenic factor 1 (SF-1) (Figure 5A) f a 
closely related orphan receptor that shares ~60% amino 
acid identity with LRH-1 in the ligand-binding domain 
(Tsukiyama et al;, 1 992; Honda et al., 1 993; Ikeda et al., 
1993). 

Using a glutathione S-transferase-(GST) pull-down 
assay, we examined whether SHP-1 binds directly to 
LRH-1 . SHP-1 was expressed in E. colt as a fusion pro- 
tein with GST, and pS]-labeled LRH-1 was synthesized 
in vitro. Glutathione-Sepharose beads efficiently copre- 
cipitated P^-labeled LRH-1 in the presence of GST- 
SHP-1 but not in its absence (Figure 5B). In parallel 
incubations, GST-SHP-1 interacted strongly with re- 
labeled human RXRa in the presence of 9-c/s retinoic 
acid (Figure 5B). These data are in close agreement with 
those derived from mammalian two-hybrid experiments 
(Figure 5A). Thus, SHP-1 interacts directly with LRH-1 . 

SHP-1 Represses Expression of CYP7A1 
Does SHP-1 have a role in the repression of CYP7A1 
expression by FXR ligands? We addressed this question 
by performing cotransfection experiments with a rat 
CYP7A1 luciferase reporter plasmid (pGL3-rCYP7A1 
[-1573/+36]) containing nucleotides -1573 to +36 of 
the rat CYP7A1 promoter, which includes a conserved 
LRH-1 binding site (Nitta et al., 1 999). In the absence of 
exogenously expressed LRH-1 , the activity of the pGL3- 
rCYP7A1{-1573/+36) reporter was low when transiently 
transfected into HepG2 cells (data not shown). Cotrans- 
fection of increasing amounts of an LRH-1 expression 
plasmid resulted in a dose-dependent increase in re- 
porter activity (Figure 6). This LRH-1 -dependent reporter 
activity was completely blocked by the cotransfection 
of SHP-1 expression plasmid (Figure 6). These data sug- 
gest that interactions between SHP-1 and LRH-1 repre- 
sent a basis for bile acid-mediated repression of 
CYP7A1 expression. 

Discussion 

The recent discovery that FXR is a bile acid receptor 
provided a great deal of insight into the molecular mech- 
anisms underlying bile acid signaling. In particular, these 
studies uncovered the mechanism whereby bile acids 
stimulate the transcription of genes, such as l-BABP, 
involved in bile acid transport High-affinity binding sites 
for the FXR/RXR heterodimer have been identified in 
both the human and mouse l-BABP promoters (Grober 
et al., 1999; Makishima et al., 1999). By contrast, the 
mechanism underlying bile acid-mediated repression of 
CYP7A1 expression remained a puzzle, since an FXRE 
had not been identified in the bile acid response ele- 
ments of this gene (Chiang and Stroup, 1 994; Chiang et 
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Figure 6. SHP-1 Represses LRH-1 -Dependent Activation of the Rat 
CYP7A1 Promoter 

HepG2 cells were transf ected with the rat CYP7A 1 reporter plasmid, 
pGL3-rCYP7A1{-1573/+36), and the indicated amounts of LRH-1 
and/or SHP-1 expression plasmids. Data represent the mean of 
assays performed in triplicate ± S.D. 



at., 2000). We now present evidence that FXR does not 
repress CYP7A1 expression directly, but rather through 
induction of the gene encoding the orphan nuclear re- 
ceptor SHP-1 f which, in turn, represses CYP7A 1 expres- 
sion. Similar findings have been reported by Lu et al. 
(2000 [this issue of Mot. Celf]). Consistent with this 
model, it was recently shown that SHP-1 expression is 
markedly lower and not inducible by cholic acid in the 
livers of mice lacking FXR (Sinai et al., 2000). Taken 
together, these data provide a molecular explanation 
for the coordinate suppression of gene expression by 
bile acids. 

SHP-1 Represses CYP7A1 Expression 
We encountered the orphan nuclear receptor SHP-1 as 
part of a comprehensive, unbiased effort to identify FXR 
target genes in the liver. SHP-1 expression was strongly 
induced in the livers of rats treated with the potent, 
nonsteroidal FXR ligand GW4064. SHP-1 expression 
was also markedly induced by GW4064 in primary cul- 
tures of human and rat hepatocytes, whereas CYP7A1 
expression was suppressed under the same conditions. 
The reciprocal relationship between SHP-1 and CYP7A1 
regulation, together with the established inhibitory ef- 
fects of SHP-1 on nuclear receptor activity, suggested 
that SHP-1 might repress CYP7A1 expression. Indeed, 
expression of SHP-1 repressed the activity of the rat 
OYP7A1 promoter in HepG2 cells. 
SHP-1 is unusual in that it lacks the highly conserved 



DNA-binding domain typically found In members of the 
nuclear receptor family. SHP-1 was originally cloned in 
yeast two-hybrid experiments using the orphan nuclear 
receptors CAR or PPARa as bait, but tt interacts with a 
number of additional nuclear receptors, including ERa 
and ERp, RAR, RXR, and TR (Seol et al. t 1996; Masuda 
et al., 1997; Seol et al. t 1998; Johansson et al., 1999). 
In each case, SHP-1 represses the ligand-induced tran- 
scriptional activity of these receptors. How does SHP-1 
repress transcription of the OYP7A 1 promoter? Our data 
indicate that SHP-1 exerts much of its effect through 
interaction with the orphan nuclear receptor LRH-1. 
SHP-1 interacted strongly with LRH-1 in both a mamma- 
lian two-hybrid assay and an in vitro pull-down assay. 
Moreover, SHP-1 efficiently repressed LRH-1 -depen- 
dent activation of the rat CYP7A1 promoter^LRH-1 was 
recently shown to activate the human CYP7A1 promoter 
by binding to an extended nuclear receptor half-site 
sequence that is conserved in the mouse, rat, and ham- 
ster CYP7A1 promoters (Nitta et al., 1999). Earlier stud- 
ies had defined DNA response elements in the CYP7A1 
and CYP8B1 gene promoters that conferred repression 
in response to bile acids (Chiang and Stroup, 1994; 
Chiang et al., 2000; del Castilio-Olivares and Gil, 2000). 
Notably, each of these negative bile acid response ele- 
ments contains an LRH-1 binding site. Consistent with 
these data, CYP8B1 expression was repressed 3-fold 
in Fisher rats treated with GW4064 (S. A. J., unpublished 
data). Thus, interactions between SHP-1 and LRH-1 are 
likely to be important for the coordinate repression of 
a number of genes by bile acids. Among the genes that 
may be regulated by the interaction between SHP-1 and 
LRH-1 is SHP-1 itself. An LRH-1 -responsive region of 
the murine SHP-1 gene has been identified (Lee et al., 
1 999). Thus, SHP-1 is likely to regulate its own expres- 
sion. This feedback regulation may provide a mecha- 
nism for attenuating the bile acid-mediated repression 
of genes by SHP-1 . A model for bile acid-mediated re- 
pression of gene expression via increased SHP-1 levels 
is shown in Figure 7. 

Two recent reports showed that SHP-1 represses the 
transcriptional activation of ERa and ERp, RXR, and the 
orphan receptor HNF4a by competing with coactivator 
binding to these receptors (Johansson et al., 1999; Lee 
et al., 2000). In addition, SHP-1 contains a strong tran- 
scriptional repressor domain in its C terminus (Lee et 
al. t 2000). Furthermore, SHP-1 has been shown to inhibit 
DNA binding of RAR-RXR heterodimers (Seol et al. f 
1 996). Taken together, these studies suggest that SHP-1 
inhibits the transcriptional activity of nuclear receptors 
through multiple mechanisms. To date, we have been 
unable to demonstrate inhibition of LRH-1 binding to its 
response element in the CYP7A1 promoter by SHP-1 
(data not shown). Thus, the mechanism by which SHP-1 




Figure 7. Model for the Feedforward and 
Feedback Regulatory Effects of Bile Acids on 
Gene Expression 

Activation of FXR by bile acids results In the 
Induction of t-BABP and SHP-1 expression. 
SHP-1, in turn. Interacts with LRH-1 and re- 
presses expression of CYP7A1 and CYP8B1. 
SHP-1 may aJso repress expression of Its own 
gene. 
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inhibits LRH-1 -mediated transactivation of the CYP7A1 
promoter remains unresolved. 

In addition to the interactions between SHP-1 and 
LRH-1, other mechanisms may play a role in bile acid- 
mediated repression of CYP7A1 expression. First, SHP-1 
binds to and represses the transcriptional activity of 
other nuclear receptors that regulate CYP7A1, including 
RXR and TR ( Seol et al., 1996; Masuda et a!., 1997). 
These interactions may also contribute to bile acid- 
mediated repression of CYP7A1 expression. Second, 
ligand-bound FXR was reported to repress LXRa activity 
on an LXRa response element (Wang et al., 1 999), al- 
though the mechanism for this fra/is-repression is not 
clear. Since LXRa stimulates rodent CYP7A 1 expression 
in response to oxysterols, repression of LXRa activity 
may contribute to the overall repression of CYP7A1. 
Thus, SHP-1 /LRH-1 interactions may be one of several 
mechanisms whereby bile acids repress expression of 
CYP7A1 and other genes. 

Parallels between SHP-1/LRH-1 and Other 
Nuclear Receptor Pairs 

Intriguing parallels exist between the SHP-1 /LRH-1 in- 
teraction and another pair of nuclear receptors. LRH-1 
is most closely related to the orphan receptor SF-1, 
which regulates the expression of enzymes required for 
steroid hormone biosynthesis (Parker, 1998; Hammer 
and Ingraham, 1999). SF-1 and LRH-1 are ~85% identi- 
cal in the amino acid sequences of their DNA-binding 
domains, and both bind as monomers to the same ex- 
tended nuclear receptor half-site sequence. Notably, the 
transcriptional activity of SF-1 is repressed by binding 
to DAX-1 (dosage-sensitive sex-reversal adrenal hypo- 
plasia congenital region on the X chromosome, region 
1; NR0B1), an orphan nuclear receptor most closely 
related to SHP-1 that also lacks the DNA-binding domain 
characteristic of nuclear receptors (Zanaria et al., 1994; 
Hammer and Ingraham, 1999). Thus, both SF-1 and 
LRH-1 are negatively regulated in a trans-dominant fash- 
ion by heterodimerization with orphan receptors lacking 
DNA-binding domains. Since SHP-1 expression is stim- 
ulated by bile acids, it will be interesting to determine 
whether DAX-1 expression is also regulated by hor- 
mones. 

A second nuclear receptor pair with similarities to 
SHP-1/LRH-1 occurs in Drosophila. Hormonal activation 
of the ecdysone receptor (EcR) during the third larval 
instar phase of Drosophila metamorphosis results in 
an increase in the expression of two orphan nuclear 
receptors, DHR3, which has a functional DNA-binding 
domain, and E75B, which does not E75B binds to DHR3 
and represses its transcriptional activity (Thummel, 
1997; White et al., 1997). This interaction is critical for 
determining the temporal progression of metamorpho- 
sis. The EcR/E75/DHR3 and FXR/SHP-1 /LRH-1 regula- 
tory cascades are remarkably similar in that hormone- 
mediated activation of a nuclear receptor (either FXR or 
EcR) induces expression of a second nuclear receptor, 
which, in turn, binds to and represses the activity of a 
third nuclear receptor. The similarities in these genetic 
hierarchies across evolution suggest that repression via 
heterodimerization may represent an important para- 
digm for the modulation of orphan receptor activity. 

Conclusions 

The mechanism whereby FXR represses expression of 
CYP7A1 and other genes has until now remained an 



enigma. Through the use of a potent, nonsteroidal FXR 
ligand, we have identified SHP-1 as an FXR target gene 
in the liver of humans and rodents. Furthermore, we 
have demonstrated that SHP-1 can interact with LRH-1 
and efficiently repress expression of CYP7A1 . Thus, bile 
acid-induced repression of CYP7A1 is mediated by a 
novel regulatory cascade of three nuclear receptors. 
Since both the CYP7A1 and CYP8B1 gene promoters 
contain LRH-1 binding sites, the SHP-1 /LRH-1 partner- 
ship is likely to have broad implications in bile acid 
signaling. Both SHP-1 and LRH-1 are orphan receptors, 
which raises the possibility that bile acid biosynthesis 
will be regulated by additional, unidentified hormones. 
Regardless of whether SHP-1 and LRH-1 have natural 
ligands, pharmacologic modulation of their interaction 
represents an exciting new opportunity for the discovery 
of drugs that regulate cholesterol homeostasis. 

Experimental Procedures 
Materials 

The synthesis of GW4064 will be described elsewhere (Maloney et 
al., 2000). CDCA, dexamethasone, estradiol, al\-tran$ retinoic acid, 
9 -c/s retinoic acid, and charcoal -stripped, delipidated calf serum 
were acquired from the Sigma Chemical Co. (St. Louis, MO). 
24(S),25-epoxycholesterol was synthesized in-house. DNA-modi- 
fying enzymes, polymerases, and restriction endonucleases were 
provided by Roche Molecular Biochemicats (Indianapolis, IN). Char- 
coal/dextran-treated fetal bovine serum (FBS) was purchased from 
Hyclone Laboratories Inc. (Logan, UT). The human hepatocellular 
carcinoma cell line HepG2 was obtained from the American Type 
Culture Collection (ATCC number HB-8065, Manassas, VA). Matrigel 
was provided by Bee ton Dickinson Lab ware (Bedford, MA). AM other 
tissue culture reagents were obtained from Life Technologies Inc. 
(Gaithersburg, MD). 

Animals 

Male Fisher rats were obtained from Charles River Laboratories Inc. 
(Raleigh, NC) and maintained on a 12 hr light/12 hr dark cycle. 
Animals were allowed food and chow ad libitum. GW4064 (30 mg/ 
kg) was administered by gavage twice a day for 7 days and the 
animals sacrificed by cervical dislocation 4 hr after the final treat- 
ment Livers were excised and snap-frozen in liquid nitrogen. Differ- 
ential gene expression analysis was performed by CuraGen Corp. 
(New Haven, CT). 

Plasmid Constructs 

Expression plasmids for the human nuclear receptor-GAL4 chime- 
ras were prepared by inserting amplified cDNAs encoding the li- 
gand -binding domains Into a modified pSG5 expression vector 
(Stratagene, La Jotla, CA) containing the GAL4 DNA-binding domain 
(amino acids 1-147) and the Simian virus 40 (SV40) large T antigen 
nuclear localization signal (APKKKRKVG). The (UAS) 5 -tk-CAT and 
(hsp27EcRE) r tk-LLIC reporter constructs have been previously de- 
scribed (Forman et al., 1995; Parks et al., 1999). pp-actin-SPAP, an 
expression vector containing the human secreted placental alkaline 
phosphatase (SPAP) cONA under the control of 0-actin promoter, 
was used as an internal control In ad transfections. The expression 
plasmids for human and mouse FXR (pSG5-hFXR and pSGS-mFXR, 
respectively) and human SRC-1 are described elsewhere (Kliewer 
et al., 1998; Parks et al., 1999). The full-length coding regions for 
human LRH-1 (Gen Bank Accession Number AB019246) and human 
SHP-1 (GenBank Accession Number L76571 ) were amplified by PCR 
and cloned Into pSG5. creating pSG5-hLRH-1 and pSG5-hSHP-1, 
respectively. A consensus Kozak sequence was created during 
ampGfication. The rat (bases -441 to +19, GenBank Accession 
Number D86745) (Masuda et al., 1997) and human (bases -572 to 
+10. GenBank Accession Number AF044316) (Lee et al- 1998) 
SHP-1 promoters were amplified by PCR using the following primer 
pairs: Rat, 6'-o^gto*jagatiaeCCTGGCTG 
(sense) and 6 ' -g ggtgt gcg ag atctCCTGTTTCTTCCTGG CTCTGT 
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GGC-3' (antisense); and human, 5'-flggtgtgcgagatctTCCTAGACT 
GGACAGTGGGCAAAG-3' (sense) and 5'-flggtglgcgagatctCTTCC 
AGCTCTCT GGCTCT GTGTT-3' (antisense). The resultant fragments 
were ktserted into the Bg/II site of pGL3-Basic, a promoter-less 
luciferase reporter vector (Promega, Madison, WI). Site-directed 
mutagenesis of putative FXREs In the rat and human SHP- 1 promot- 
ers was performed using the Transformer mutagenesis system 
(CLONTECH Laboratories, Palo Alto, CA) with the AratlRI (bases 
-321 to -287, 5'-CCTGGTACAGCCTG^aTAATAtaaCTGTTTATAC-3') 
and AhumanlRI (bases -304 to -270, 5 ' -CCTGGTACAGCCTGA 
aaTAATGtaCTTGTTTATCC-3') primers. Mutated constructs were 
verified to be free of nonspecific base changes by sequencing. 
pGL3-rCYP7A1(-1 5737+36) contains bases -1573 to +36 of the rat 
CYP7A1 promoter (GenBank Accession Number Z14108) inserted 
into the Whel site of pGL3-Basic. VP1 6-nuclear receptor chimeras 
contain the 80 aa Herpes virus VP1 6 transactrvation domain (inked 
to the Rgand-binding domain of the following nuclear receptors in 
a modified pSG5 expression vector human COUP-TFII, ERa, LRH-1 , 
LXRa, RARa, and TRfJ; mouse FXR, LRH-1, RXRa, and SF-1; and 
rat HNF4a. 

Transient Transfection Assays 

Transient transfection of CV-1 cells was performed exactly as de- 
scribed elsewhere (Jones et al., 2000). Typically, transfection mixes 
contained 2-5 ng of receptor expression vector, 20 ng of reporter 
construct, and 8 ng of pp-actin-SPAP. The amount of DNA used 
in each transfection was adjusted to 80 ng with carrier plasmid 
(pBluescript, Stratagene). Mammalian two- hybrid experiments uti- 
fized transfection mixes containing 20 ng of VP16 nuclear receptor 
tig and -binding domain expression vector, 5 ng of pSG5-GAL4- 
SHP-1, 15 ng of (UAS) s -tk-CAT, and 8 ng of pp-actin-SPAP. Cells 
were maintained for 24 hr in the presence of drug (added as a 
1000X stock in dimethyl sulfoxide) in DMEM/F-12 nutrient mixture 
containing 10% charcoal -stripped, delipidated calf serum. An ali- 
quot of medium was assayed for SPAP activity, and the cells were 
tysed prior to determination of luciferase expression. Luciferase 
activities were normalized to SPAP. HepG2 cells were maintained 
in DMEM/F-12 supplemented with 10% heat-inactivated FBS (Ufe 
Technologies Inc.). Plasmid DNA was transfected into HepG2 cells 
using the FuGENE6 transfection reagent according to the manufac- 
turer's instructions (Roche Molecular Biochemicals). Thus, 24 -we II 
culture plates (15 mm diameter) were inoculated with 7 x 10 s cells 
24 hr prior to transfection. Cells were transfected overnight in serum- 
free DMEM/F-12 with 100 ng of reporter construct. 32 ng of pp- 
actin-SPAP, and 0-400 ng of receptor expression vectors (adjusted 
to 400 ng with carrier plasmid). Following transfection, the medium 
was aspirated and the cells were cultured for a further 48 hr in 
DMEM/F-12 supplemented with 10% heat-inactivated FBS. SPAP 
and luciferase values were determined as described above. 

Primary Culture of Human and Rat Hepatocytes 
and Northern Blot Analysis 

Primary human hepatocytes were obtained from Dr. Steve Strom 
(University of Pittsburgh). Rat hepatocytes were isolated as de- 
scribed elsewhere (LeCtuyse et al., 1996). Cells (1.5 x 10*) were 
cultured on Matrigel-coated 6-weD plates in serum-free Williams' 
E medium supplemented with 100 nM dexamethasone, 100 U/ml 
penicann G, 100 fig/ml streptomycin, and ins ufin -transferrin -sele- 
nium (TTS-G, Ufe Technologies Inc.). Twenty-four hours after isola- 
tion, hepatocytes were treated with either GW4064 (0.1-10 *iM) or 
COCA (1-100 |iM), which were added to the cutture medium as 
1000X stocks In dimethyl sulfoxide. Control cultures received vehi- 
cle alone. Cells were cultured for a further 48 hr prior to harvest, 
and total RNA was Isolated using a commercially available reagent 
(Trizot Ufe Technologies Inc.) according to the manufacturer's In- 
structions. Total RNA (10 jtg) was resolved on a 1 % agarose/2.2 M 
formaldehyde denaturing gel and transferred to a nylon membrane 
(Hybond N+, Amersham Pharmacia Biotech Inc.. Piscataway. NJ). 
Blots were hybridized with *P -labeled cDNAs corresponding to hu- 
man SHP-1 (GenBank Accession Number L76571), human CYP7A1 
(bases 99-1564, GenBank Accession Number M93133), mouse 
SHP-1 (bases 30-783, GenBank Accession Number L76567), or rat 
CYP7A1 (bases 235-460, GenBank Accession Number J05460). 



Subsequently, blots were stripped and reprobed with a radiolabeled 
p-actin cONA (CLONTECH Laboratories). 

Bectrophoretic Mobility -Shift Assays 

Oectrophoretic mobility-shift assays (EMSA) were performed essen- 
tially as described elsewhere (Lehmann et al., 1997). hFXR and 
hRXRa were synthesized from pSG5-hFXR and pSG5-hRXRa ex- 
pression vectors, respectively, using the TNTT7 Coupled Reticulo- 
cyte System (Promega). Unprog rammed K/sate was prepared using 
the pSG5 expression vector (Stratagene). Binding reactions con- 
tained 10 mM HEPES (pH 7.8), 60 mM KCl, 0.2% Nonidet P-40, 
6% glycerol, 2 mM dithiothreitol (DTT), 2 jig of poly(dl-dC)«poly(dl- 
dC), and 1 p.l each of synthesized hFXR or hRXRa. Control incuba- 
tions received unprogrammed h/sate alone. Reactions were pre- 
incubated on ice for 10 min prior to the addition of ["PJ-labeled 
double-stranded oligonucleotide probe (0.2 pmof). Competitor oli- 
gonucleotides were added to the preincubation at 5*, 25-, and 75- 
fold molar excess. Samples were held on ice for a further 20 min, 
and the protein-DNA complexes resolved on a pre-electrophoresed 
5%polyacrytamide gel in 0.5 x TBE (45 mM TriSrborate, 1 mM EOT A) 
at room temperature. Gels were dried and autoradiographed at 
-70°C for 1-2 hr. The following doublerstranded oligonucleotides 
were used as probes and competitors in EMSA: rSHP, 5'-gatcCCTG 
GGTTAATAACCCTGT-3'; mSHP, 5'- gatcCCTGGGTTAATGACCC 
TGT-3'; hSHP, 5'- gatcCCTGAGTTAATGACCTTGT-3'; ml-BABP, 
5 ' -gatcTTAAGGTG AATAACCTTGG-3 hl-BABP, 5'-gatcCCAGGT 
GAATAACCTCG G -3 ' (Groberet al., 1 999); and mSHPmut 5'-gatcCC 
TGGaaTAATGttCCTGT-3'. 

GST Pull-Down Assays 

GST-SHP-1 fusion protein was expressed in BL21(DE3)pIysS cells 
and bacterial extracts prepared by one cycle of freeze-thaw of the 
cells in protein lysis buffer containing 50 mM Tris (pH 8.0), 250 
mM KCl, 1 % Triton X-1 00, 1 0 mM DTT and 1 x Complete Protease 
Inhibitor (Roche Molecular Biochemical) followed by centrifugation 
at 40,000 x g for 30 min. Glycerol was added to the resultant super- 
natant to a final concentration of 1 0%. Ly sates were stored at -80*C 
until use. pSJ-labeled human LRH-1 or human RXRa was generated 
using TNTT7 Coupled Reticulocyte System (Promega) in the pres- 
ence of Pro-Mix (Amersham Pharmacia Biotech Inc.). CoprecipHa- 
tion reactions included 25 jjlI of lysate containing GST-SHP-1 fusion 
protein or control GST; 25 nl of incubation buffer (50 mM KCl, 40 
mM HEPES [pH 7.5], 5 mM 0 -m e reap toe than ol, 0.1 % Tween 20 and 
1% nonfat dry milk); and 5 *l! of [^Sl-labeled LRH-1 or RXRa. The 
mixtures were, incubated for 25 min with gentle rocking at 4"C 
prior to the addition of 20 (xJ of glutathione-Sepharose 4B beads 
(Amersham Pharmacia Biotech Inc.) that had been extensively 
washed in protein lysis buffer. Reactions were incubated at 4*C with . 
gentle rocking for a further 20 min. The beads were pelleted at 3000 
rpm in a micro fuge and washed four times with: protein incubation 
buffer. Following the final wash, the beads were resuspended in 25 
id of 2x SOS-PAGE sample buffer containing 50 mM DTT. Samples 
were heated to 100"C for 5 min and resolved on a 10% acrytamide 
gel. Autoradiography was performed overnight 

Statistical Analyses 

Unless otherwise stated, data are expressed as mean ± standard 
deviation (S.D.). The significance of differences in SHP-1 and 
CYP7A1 expression between vehicle- and GW4064 -treated animals 
were analyzed using an unpaired Student's /-test 
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rapw/ proliferation and identification of newly cloned GPCRs 
reveal a much greater diversity within this supergene family than 
was previously considered at the pharmacological leveL 
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The transfer of information across 
the cell plasma membrane is a critical 
feature for the proper function] ng of liv- 
ing cells. For many hormones, neuro- 
transmitters and chemotactic factors* 
signal transduction is accomplished 
through the specific interaction of these 
bioactive molecules (agonists) with 
cell-surface receptors that couple to 
guanine nucleotide-binding regulatory 
proteins (G-proteins) (for a review see 
reference 1 ). The consequence of recep- 
tor occupancy by agonist is the genera- 
tion of an intracellular second messen- 
ger signal that causes the cell to respond 
in an appropriate manner. G-prote in- 
coupled receptors (GPCRs) play a key 
role in many physiologic processes, in- 
cluding nerve-to-nerve transmission, 
cardiac and smooth muscle contraction/ 
relaxation, endocrine and exocrine se- 
cretion and chemotaxis. The fact that 
GPCRs mediate a broad spectrum of 
cellular events make these proteins an 



ideal target for drug interaction and 
therapeutics. 

As with all members of the GPCR 
gene family, the mechanism of signal 
transduction involves receptor coupling 
to a G-protein (for reviews see refer- 
ences 2 and 3). G-proteins are hetero- 
trimeric proteins formed of a single 
GDP-bound a-subunit, one (J-subunit 
and one y-subuniL In response to ag- 
onist binding. GPCRs undergo a 
change in conformation (receptor-acti- 
vated state) that triggers the formation 
of an agomst/receptor/G-protein terna- 
ry complex. Concomitant to ternary 
complex formation is the exchange of 
GDP for GTP on the a-subunit, thereby 
freeing the a-subunit from the pY-sub- 
urtits. Consequently, the GTP-contain- 
ing a-subunit (and in some cases the py- 
subunits) acts to stimulate or inhibit an 
array of effector enzymes including 
adenylyl and guanyryl cyclase, phos- 
pholipase A and C, phosphodiesterases 
and ion channels. Termination of the 
signal transduction cascade is accom- 
plished by the intrinsic GTPase activity 
found on the a-subunit Hydrolysis of 



bound GTP to GDP and inorganic 
phosphate leads to reassociation of the 
a-subunit with the (Jy-subunits and dis- 
sociation of the agonist/receptot/G-pro- 
tein complex. 

The first member of the GPCR gene 
family whose sequence was elucidated 
was the visual photoreceptor rhodop- 
sin. 4 - 5 During the past ten years, the 
number of cloned receptors has steadily 
risen and now approaches 200. 6 These 
proteins are single polypeptides ranging 
in size from about 400-1000 amino 
acids. The activating ligand for GPCRs 
varies widely in character (Table I), yet 
these receptors share a highly conserved 
structure and topography. The hallmark 
feature of GPCRs is the presence of 
seven relatively hydrophobic domains, 
each 20-28 amino acids in length, that 
are presumed to span the lipid bilayer in 
an a-helical arrangement (Fig. 1). For 
the most part, it is the membrane-span- 
ning regions which exhibit the greatest 
degree of amino acid sequence identity, 
ranging from 20% to more than 50%, 
depending on which receptor proteins 
are being compared. 7 More divergent 
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TABLE I: ENDOGENOUS LIGANDS 
FOR G-PR OTEPM- COUPLEO 
RECEPTORS 

Biogenics ajntoes/neurotranstnilters 
Acetylcholine 
Adenosine 
Dopamine 
Epinephrine 
Glutamate 
Histamine 
Norepinephrine 
Octoparoine 
Serotonin 

Peptides/peptide hormones 
Angiotensin 

Bombesin-lUce peptides (neuromedin B. 

gastrin-releasing peptide 
Bradykinin 
C5a anaphylatoxin 
Calcitonin 
Eadothelin 
N-fbrroyl peptide 
IntrrleuJrin-8 

Neuromedin K (also known as neurokinin 
B) 

Neuropeptide Y 
Neurotensin 

Parathyroid hormone/parathyroid related 

peptides 
Secretin 
Somatostatin 

Substance K (also known as neurokinin A) 
Substance P 

Thyrotropin-releasing hormone 
Vasoactive intestinal peptide 

Glycoprotein hormones 
Follicle-stiinulaiing hormone 
Lutropin/choriogonadoaopm 
Thyroid-stimulating hormone (also known 
as thyrotropin) 

Regulatory factors 
cAMP 

Cannabinoids 
Platelet-activating factor 
Thromboxane A2 
Thrombin 

Yeast-maong factors (a and aJpha-phcro- 
mones) 

Miscellaneous 
Light 
Odorants 



are the extracellular amino- and intra- 
cellular cartx>xyl-terrninal regions, as 
well as the six hydrophilic regions that 
connect the hydrophobic domains of the 
receptor to form alternating extraceUu- 
lar (el, e2, e3) and intracellular (il, i2, 
i3) loops (Fig. 1 ). This current model for 
the tertiary structure of GPCRs is based 
on analogy with bacteriorhodopsin, a 
light-activated proton pump whose 
three-dimensional structure was de- 
duced from electron microscopy. 8 * 9 The 
structure ofbacteriorhodopsin is seen as 



HOOC- 



intracellular 




Fig 1. Model of the structural domains of G-protein-coupIed receptors. The transmembrane domains 
are depicted as cylinders perpendicular to the plane of the plasma membr a ne. Transmembrane do- 
mains 1-7 (TM1-TM7) are proposed to traverse the membrane in an alpha-helical fashion and be 
connected by alternating extracellular (e l-c3) and intracellular (i 1— i3) loops. The amino- (NH2) and 
carboxyl- (COOH) terminal regions of G-protein-ccupled receptors are situated at the extracellular 
and intracellular sides of the plasma membrane, respectively. 



having seven transmembrane a-helices 
connected by hydrophilic loops, with 
the transmembrane domains being ar- 
ranged in bundles perpendicular to the 
lipid bilayex In addition, both bacterio- 
rhodopsin and the GPCR rhodopsin 
contain the light-absorbing molecule 
1 1-cw-retinal. A conserved Lys residue 
found in the same relative position on 
transmembrane domain 7 (TM7) in bac- 
teriorhodopsin asin rhodopsin serves as 
the covalent attachment point for the 
chromophore. Although bacteriorho- 
dopsin does not belong to the family of 
GPCRs, the structural similarities be- 
tween these two classes of proteins are 
noteworthy. 

Based on primary sequence analy- 
sis, members of the GPCR gene family 
can be categorized into distinct subfam- 
ilies (Figs. 2 and 3). These include re- 
ceptors that bind the biogenic amines 
(e.g^ epinephrine, dopamine, acetyl- 
choline), glycoprotein hormones (e.g., 
thyrotropin, follicle-stimulating hor- 
mone, lutropin/chonogonadotropin) 
and neurokinins (e.g-, substance P, sub- 



stance K, neuromedin K). The recent 
cloning of the calcitonin, parathyroid 
hormone and secretin receptors repre- 
sents the delineation of yet another sub- 
family of GPCRs. These receptors are 
more closely related to one another (up 
to 42% sequence identity) than to any of 
the other seven transmembrane-span- 
ning GPCRs (less than 12%). 10-12 In 
many instances, a receptor within a sub- 
family can be further divided into 
subtypes, each encoded by a separate 
gene. For example, muscarinic acetyl- 
choline receptors (rnAChRs) comprise 
at least five distinct subtypes (ml, m2, 
m3, m4, and m5). 13 Similarly, discrete 
molecular subtypes of the dopamine re- 
ceptor have been described (Dj, D2, D3, 
D4,D 5 ).» 4 

During the past five years, consid- 
erable insights have been gained into 
the structure-functioo relationship of 
GPCRs through the construction of mu- 
tant receptor genes. 1 * 15 Inferences 
about receptor structure and function 
have been deduced from the phenotypes 
of the mutant proteins. In vitro mutage 
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Fig. 2. Relative homology of G-proteirv-coupled receptors. Sequences were aligned using CLUS- 
TAL 60 and refinements to the alignment were made manually. The dendogram was created using the 
DcSoete Tree Fit 61 and TrecTool (Mike Maciukenas, University of Illinois, unpublished). Only the 
aligned transmembrane regions were used in the distance calculations. The lengths of the lines are 
proportional to Che percent di fference between any two given sequences. All programs Were run using 
the Genetic Data Environment (Steve Smith. Harvard University, unpublished). The considered re- 
ceptors are as follows: hmlmAChR, human ml muscarinic 62 ; hAlaAR, human alpbaj.-adrenerg- 
ic w ; bB2AR. human beta r adrenergic 64 ; hp I DR. human Di dopamine 65 ; mDOR, mouse delta-opi- 
ate 66 ; hFMLPR. human ^-formyl peptide 67 ; hSKR. human substance K 68 ; hSPR. human substance 
P 6 *; hFSHR, human follicle stimulating hormone 70 ; hLH/CHftehuman totropin/choriogona<iotro- 
pin' 1 ; hTSHR, human thyrotropin 72 ; hOLFR, human olfactory 75 ; hOPS, human Aodopsm 5 ; pCTR, 
porcine calcitonin 10 ; bSCR. human secretin. 12 



ncsis of GPCRs has been used lo 1) 
identify the amino acids critical for li- 
gand binding; 2) determine the domains 
on the receptor responsible for interact- 
ing with G-proteins; and 3) analyze the 
molecular basis of receptor desensitiza- 
tion. By using molecular modeling 
techniques in conjunction with infor- 
mation gained by mutational analysis, a 
better understanding of the roles played 
by various regions of the receptor pro- 
tein will provide the rationale for future 
drug design. 16 " 18 



Ammo-terminal domain and 
extracellular loops 

An interesting aspect concerning a 
number of GPCRs is the apparent lack 
of an amino- terminal signal peptide se- 
quence. The signal peptide has been 
demonstrated to be essential for the 
proper function of integral and secreted 
proteins, 19 suggesting that an internal 
signal sequence must exist on those 
GPCRs lacking an amino-tenninal one. 
In contrast, for GPCRs containing a 
large arruno-terminal domain (more 
than 300 amino acids), such as the roeta- 



botropic glutamate receptor (mGlutR) 
and glycoprotein hormone receptors, 
the presence of a signal sequence has 
been noted on the amino- terminus. 20 " 22 
Indeed, the existence of an ammo-ter- 
minal signal has been confirmed exper- 
imentally where the first 26 amino acids 
deduced from the cDNA sequence of 
the lutropin/choriogonadotropin recep- 
tor (LH/CG-R) are absent on the amino 
acid sequence derived from purified 
LH/CG-Rs 23 

Within the amino- terminal domain 
of all GPCRs are two or more consensus 
sequences (Asn-X-Ser/Thr) for re- 
linked glycosylation. For biogenic 
amine receptors, it is apparent that re- 
linked glycosylation is not crucial in li- 
gand (agonist and antagonist) binding 
orreceptor/G-protein coupling. For ex- 
ample, treatment of purified p-adren- 
ergic receptors 0AR) with endoglyco- 
sidases to remove carbohydrate 
moieties has no apparent affect on the li- 
gand bindin&and coupling properties of 
the reconstituted receptor. 24 - 25 Inhibi- 
tors of N-linked glycosylation (e.g., tu- 
nicamycin) are equally impotent in af- 
fecting ligand binding to newly 
synthesized receptors. 26 Similar results 
are seen with the expression of mutant 
pARs and mAChRs. 27 * 28 It is likely that 
glycosylation is essential for the subcel- 
lular distribution of some, but not nec- 
essarily all, GPCRs. In the case of 
pARs, mutant receptors lacking con- 
sensus glycosylation sites do not traffic 
correctly to the cell surface. 27 Whether 
the trafficking defect is due to a de- 
crease in the translocation of receptors 
from internal stores to the cell surface or 
an increase in the rate of cell surface re- 
ceptor internalization remains to be re- 
solved. 

GPCRs whose endogenous ligands 
are biogenic amines lack significant 
amino-tenninal domains (less than 50 
amino acids). Early studies focused at- 
tention on this region as a potential can- 
didate for the ligand binding domain. 
The role of the extracellular domains in 
ligand binding is best exemplified by 
the PAR, a prototypical biogenic amice 
receptor. When solubilized PAR is 
treated with proteolytic enzymes, the 
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FLFSUtCADLX ICVFSWHLYTt.YT 



i 1 

srpcEtwrsvpvtcvTA* I etlcvj ald*y 
-HrwctpwrsicvtcvTAB i etucvxavdky 

Rirt»IHAAVPVU0CTA5XLSlJCAISIWIY 
KTUCEiyiJlU)VUrTSSIVHI-CAISU«T 
<^raiW^AFI>IMCST)^ILXUVXSVDlT 
QFVTCOLTI ALDVLCCTS« LHLCA I ALOW 
TtJWX>LWlJkL©TVASHA5V>{HUXISFt)rr 
PVVCDL*fUOJ>TWSKASVWU.I ISFTOY 



ARCLVCTVWAI SALVSFLPI LM 
AB VI I LKVW 1 VSCLTSF . LPIO 
AILALLSVWVLSTVI SIC . PLv 
KAI . irrtWVXSAVISFPPLIS 
AFILISVAWTLSVUSFI PVQL 
KULI5LTWL ICFLIS1 P . PML 
AALMICLAWLVSFVLWA PAXLF 
ACMMI AAA WVLS F I IXAPA I IS 



noGluRl 
rmcluR2 



I IAIAFSCLC I LVTLFVaiFVLY 
V<»VTTACLCALATLFVtCVFVRH 



T I FKUkLAIULATSTLPFOSAKtL UaXKAVLSIDYrKKFTSXFTLTMMSVX>aT AKLINICI*VL*SGV^PIMVW 

SYLNUkVADFCFTSTLPFFHVRJCA WFIXXFLTTIVOINLFC^VFLIALIAUIRC AKKVIICPWHALLLTLPVIIR 

FLVWJkFA£ASKAAFHTVVHFTYA tFTCKFHKFTPIAAVFASIYSHTAVAFDRT TKWXCV IWV LALLLAf PQGYY 

F I VKIJkLADLOiAA PHAAFN FVY A RAFCYFQKLFPITAMFVSIYSKTAlAADJtY 7*A V I AG I WLV ALA LAS PQCFY 

Y L VSLA VADLKVLVAACLPN I YDS TVCCLCIYTl*QYIiCINASSCStTAFTI£W AKKIIIFVWAFTSIYCMl,EFFL 

LMCNX^rADFCHCLYLLLXASVDS C^CCSTACFTrVFASEl^VYTLTVITLEJtH AILIMLCCWLFSSLIAULPLVC 

U4CNLAFADCCICIYIXLIASVDI CACCDAAC FFTVFAS ELSVTTLTAXTLI^H AASVKVHCWIFAFAAALFPrFC 

LMOfLAFADTCMGKYLLLIASVCL C PGOfTJlG FFTVFAS CL-BVYTLTV XTt£RH ACAIMVCCWVCCFLLALLPLVC 

YI I LAC I FLCYVC . PTTLIAKPTT Y WRLLVCt^SAMCYSALVTKWRXAR I LA QVT1ASILI SVQLTLWTLI IM 

YILLCCVFLCY . QfTFVFIAXPST TUWUU>CTAFSVCYSAIXTKT>IR1ARIFC QVAICLALISCOLLIVAAWLW 



TM5 



TM6 



TM7 



i r 



hBlAA AYAIASSWCrrvrLC IMAFVYL 

hB2AR A YAjASaj^feTYVrLV I KVFVYS 

(Ub albAR FYALFa9U»FTirLAVILVMYC 

ho2»AR wmsaCIOTEAPCLlKILVYV 

hDlPR TTAlSS«VlimPVAlHIVTTr 

bSHTlaR CYTIYSTFGAITIPLLLMLVLYC 

htslmAChR imcrTAMAATTLPVYVMCTLYW 

ta2fl*ChR AVTTCTAIAAITLPVIIKTVLYW 



11 1 

DRLFVFFVWLCY AfUAFVP XX YC 
K CVYI LLNWICYVHSCFHPLIYC 
DAVFXWFWLCYFWCLW I ITP 
DAVKFVVF*LCYPH*CIJfP I ITP 
SKTTOVFVWCFVANtSLWl XYA 
TLLCA I IKWLCYSKJ LLWVTYA 
SrLWELCYWLCYVyil IW V HC IA 
NTWTICYWLCYIKrrtBFACYA 



mDOR 
tlFHLPR 



VTKICVFLFATWPILI ITVCTC 
VRC I IRFI XGFSAFMS I VAVSYC 



hSPR VYHICVTVLI1ELPLLVICYAYT 

HSXR LYHLWIALIXELPLAVMFVAYS 

■TRKR . PIYLMDFCVFJVMFMILATVLTC 

hLH/CCR Y ILTILILNWAFF I ICACYI KI 

hFSHR YVKSIXVLKVLAFWICGCYIHI 

hTSHR YIVFLVTLNCVAFVXVCCCIfVKX 

mClutRl LCWAPVCYWGLLIKSCTYYAFK 

nClutR2 ASKLCSLAYWVLLIALCTLYAFK 



TLCI iHCVTTLCWLrrFLAMVV 
TXCIlMCTTTLCWL^rFTVNIV 
TLGX VVCMP XLCVLPTFIALPL 
TLG IWCMF ILOILrmALPt 
TXSVXMCVFVCCWLrrFIUCI 
TLG X IMCTTI LOVLrFFXVALV 
TLSAI LLAT ILTirr>XN I KVLV 
TILAI LLATI ITWA?XMVMVLI 

MVL WVCATWCWAf XH I FVXV VAALHLCLALC YAKJSLHFVLYA 

VLSrVAAATFLCWSPYQWALI C I AVCVTSALA F FNSCCJEFMLYV 

>fMIVWCTFAICWL?FBIFFLL «JVYLAIK«LAMSSIHYWIXYC 

TWLWLTTAICWLPYHLYFIL OOVYIALFWKAMSSTXWIXTC 

MLAWVILTALLIWPYRTLWV KWFLLFCRI CI YLN*A IWV IYN 

KKAILlFTDFTCMAPISFFArS TKSKVLLVLFYPIK*CA»FLYA 

RHAHLI FTDFLCMAPISFFAIS SXAK ILLVLFH PXHSCAtfPFLYA 

RHAVLI FTDFICMAHSFYALS SKSKILLVLF Y PLW*CAKPFLYA 

AFTMYTTCI IVLAFVpIYFCSN CFAVSLSVTVALCCHFTPKMYII 

CFTMYTTCI XWLAFLPIFYVPS CVSVSLSGSWLCCLFAPKLHI I 



Fig. X Aligned amino acid sequences of the seven transmembrane domains (TM 1-7) and adjacent residues of C-protein-coupIed receptors. Bold residues 
represent highly conserved amino acids. Shaded residues represent conserved residues within a subfamily of receptors. The considered receptors are as 
follows: hbeta 1 AR, human beta ( -adrenergic 74 ; ham alpha 1 bAR. hamster alpbaib-adrenergic 7 *; balphalaAR, human alpha^- adrenergic 41 ; h5HTl aR, hu- 
man 5-HTij 76 ; mTRKR, mouse thyrotropin releasing hormone 77 ; rmGtutRl, rat nietabotropic glutamate receptor l 22 ; and rrnGloiR2. rat metabotropic 
gtuxamate receptor 2. 22 References for remaining sequence data can be found in Figure 2. v 



resulting hydrophobic core retains its 
capacity to bind the antagonist 
[ 125 IJ-iodocyanopindolol (ICYP). 29 
Furthermore » the cryptic core is still able 
to activate the G-protein G s in response 
to agonists, which suggests that the hy- 
drophilic extracellular regions of the 
receptor are not crucial for receptor- 
ligand interactions. Utilization of ge- 
netic techniques has further delineated 
the role of die extracellular domains on 
biogenic amine receptors in ligand 
binding. Deletion mutagenesis of the 
02 AR revealed that, for the most part 
the amino- and cartx>xyMerminal do- 
mains and el, e2 and e3 do not contrib- 
ute to the binding of ICYP and the ago- 
nist isoproterenol- 30 * 3 1 In contrast, re- 



moval of any of4he traiisraembrarte do- 
mains practically abolishes ligand 
binding. It is apparent from these stu- 
dies that the binding domain of at least 
one subfamily of GPCRs (biogenic 
amine receptors) does not involve the 
extracellular hydrophilic regions, but 
actually resides in the transmembrane 
domains. The same is likely true for the 
receptors that bind small peptide hor- 
mones, but confirmation awaits future 
experiments. For the glycoprotein hor- 
mone receptors, the large amino-tciTni- 
nus (more than 300 amino acids) con- 
tains 14 imperfect Leu-rich repeat 
domains. 20 - 21 - 23 It is thought that the 
large glycoprotein borxnoiies (28-38 



kDa) bind to this repeat structure before 
interacting secondarily with the mem- 
brane-spanning regions. Through the 
construction of chimeric receptors be- 
tween members of this receptor sub- 
family, the extracellular ajnirKKcnni- 
nal domain has been established as the 
ligand binding site. 32 * 33 In fact, the ex- 
tracellular domain of the LH/CH-R (mi- 
nus the remainder of protein) can be ex- 
pressed ' in transfected cells that 
consequently bind choriogonadotropin 
with high affinity. 34 

A structural feature shared by aO 
GPCRs is the presence of a conserved 
Cys residue on el and another on e2. 
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These residues have been implicated in 
the formation of a disulfide bond, since 
replacement Qf either one of these resi- 
dues at position 106 (Cysl06) or 184 
(Cysl84) with Val in the foAR pro- 
duces a mutant receptor with altered ag- 
onist binding properties. 50 Similarly, 
mutation of Cys98 or Cysl78 in the 
mlmAChR, and CysllO or Cysl87 in 
rhodopsin, completely abolishes ligand 
binding. 35 - 36 Peptide sequencing of the 
mlmAChR has confirmed the involve- 
ment of these conserved Cy s residues in 
disulfide bond formation. 37 From these 
studies, it is believed that the disulfide 
linkage does not participate directly in 
ligand binding per se, but rather serves 
a physical role by maintaining the tertia- 
ry structure of GPCRs. 

Conserved amino acids in the 
transmembrane domains 

Comparison of the deduced amino 
acid sequences of members of the 
GPCR gene family has led to the identi- 
fication of conserved residues located in 
several transmembrane domains (Fig. 
3). Some residues appear to bp globally 
conserved in the majority of GPCRs de- 
spite major structural differences in the 
endogenous ligands that bind to this 
family of receptors (e.g., catechola- 
mines, peptides, glycoprotein hor- 
mones). One hypothesis is that these 
highly conserved residues play a com- 
mon functional or structural role, for ex- 
ample, in the process of receptor activa- 
tion. Among the conserved residues 
found throughout the GPCRs are the 
Gly-Asn pair in TM1, a Leu-Ala-X-X- 
Asp-Leu motif in TM2, an almost ca- 
nonical motif of Asp-Arg-Tyr at the 
TM3/i2 junction, an invariable Trp resi- 
due in TM4, and a Pro residue flanked 
by aromatic amino acids in TM5, TM6 
and TM7. Interestingly, the secretin/ 
parathyroid honnone/calcitonin recep- 
tor subfamily and the mGluR subfamily 
are practically devoid of these con- 
served amino acids. In contrast to the 
globally conserved residues, other ami- 
no acids (found only in a subfamily of 
GPCRs) are postulated to be involved in 
receptor class-specific functions, such 
as the binding of biogenic amine li- 
gands* These include the conserved Asp 
residue in TM3 and a Ser-X-X-Ser mo- 



tif in TM5 of the biogenic amine recep- 
tors (Fig. 3). 

As mentioned above, one of the gen- 
eral features common to a majority of 
GPCRs is the presence of a pair of Asp 
residues, one located in the Leu-Ala-X- 
X-Asp-Leu motif of TM2 and the other 
situated in the Asp- Arg-iyr motif at the 
junction of TM3 and i2 (Fig. 4) The im- 
portance of these two Asp residues in re- 
ceptor function has been well docu- 
mented for the foAR, mlmAChR* 
ct2A-adrenergic receptor (<X2aAR) and 
dopamine Di receptor 38-40 Expression 
of the human <*2aAR gene in Chinese 
hamster ovary cells, cells that normally 
lack endogenous adrenergic receptors, 
leads to a pertussis toxin-sensitive inhi- 
bition of adenylyl cyclase activity fol- 
lowing epinephrine exposure. 41 In per- 
tussis toxin-pretreated cells, however, 
agonist-mediated activation of <X2aAR 
leads to an increase in cAMP levels. 41 
Substitution of Asp79 with asparagine 
( Asn) in TM2 of the <X2aAR produces a 
mutant receptor displaying high-affin- 
ity agonist binding and relatively nor- 
mal antagonist binding properties. 38 
However, the ability of adrenergic ago- 
nists to attenuate adenylyl cyclase activ- 
ity, as well as enhance cAMP levels 
in pertussis toxin-pretreated cells, is 
abolished. Consistent with the inabil- 
ity of agonist to activate mutant 
[Asn79](X2AARs was the observed lack 
of guanine nucleotide-sensitive high- 
affinity agonist binding. Asp 1 30 at the 
TM3/i2junction of the a^AR also ap- 
pears to influence receptor/G-protein 
coupling. Mutation of this residue to 
Asn eliminates high-affinity, guanine 
nucleotide-sensitive agonist binding. 
Moreover, agonist-mediated inhibition 
of adenylyl cyclase activity is markedly 
attenuated, while elevation of cAMP 
levels is abolished in pertussis toxin- 
treated cells. 

Similar Asp- to- Asn mutations in die 
corresponding positions of the fcAR 
and mlmAChR either significantly at- 
tenuate or completely eliminate the 
ability of the mutant receptors to acti- 
vate adenylyl cyclase and phospholi- 
pase C activities, respectively. 42 " 44 On 



the other hand, the effects of these muta- 
tions on ligand binding were nominal. 
Whereas muscarinic agonist and an- 
tagonist binding are relatively unaf- 
fected in the [ Asn7 1 ] m 1 mAChR and 
[Asn 1 22]m 1 mAChR mutants, adren- 
ergic agonist affinity is decreased in 
[Asn79]p2AR and slightly increased in 
[AsnDO^ARs 42 - 44 Taken together, 
these studies suggest that the conserved 
Asp residues in TM2 and at the TM3/12 
junction are crucial for agonist-induced 
receptor activation or receptor confor- 
mational changes. It has previously 
been speculated that these invariant 
negatively charged residues may bind 
cations and serve as a "charge relay sys- 
tem" during receptor activation by ago- 
nists 45 It is plausible that the movement 
of these ions is key to receptor confor- 
mational changes following agonist 
binding. In fact, Asp79 is known to be 
involved in sodium-dependent alloster- 
ic regulation of a^AR. 46 Interestingly, 
the mutant [Asn79]ot2AAR was found 
. to couple to inhibition of adenylyl cy- 
clase and calcium currents but not to po- 
tassium channel activation in AtT20 
mouse pituitary tumor cells, suggesting 
that CLEARS undergo different confor- 
mations to couple to different G-pro- 
teins 47 

There exists in many G-protein- 
coupled receptors an Asp residue si- 
tuated near the extracellular side of 
TM3 (Fig. 4) Replacement of Aspll3 
with Asn in the 012aAR abolishes yo- 
himbine binding and markedly de- 
creases agonist stimulation of the mu- 
tant receptor. 38 Mutation of the 
corresponding Asp residue in both 
foARs and mlmAChRs likewise af- 
fects ligand binding 42-44 It is unlikely 
that mutation of this residue alters nor- 
mal receptor processing and insertion 
into the- lipid bilayer, since 
tAsnl BJfoAR can be detected by im- 
munoblotting in membrane prepara- 
tions. 48 These findings are consistent 
with the hypothesis that this Asp residue 
that is conserved among all biogenic 
amine receptors, including the aAR, 
PAR, mAChR, dopamine receptor and 
serotonin receptor, is involved in an 



492 



DN&P 6(7). September 1993 





o- Adrenergic 
p -Adrenergic 

Muscarinic 
Dopamine 



Hamster a i 
Human u'j 

Human Pi 
Human p2 
Rat p 2 

Rat ml 
Hat n»2 
Rat *3 
Rat m4 
Rac mS 

Rat 02 



79 m I™ 

FIVKIAI^BLLLSFTVLPFSATLE VLGY.WVLCRIFCD IWAAVpVLCCTASILSCLAlSI D t 
FXVSLASJCKLVATLVIPFSLANE VMCY.WYFGKTWCE IYLAIDrt-FCTSSIVHLCAISLDl 

F I MS LAS A D ,VMGLLWPFGATIV VWGR.WEYGSFFCE LWTSV D OXVTAS I ETLCVI Al D I 



FITSLACA y .VMGLAWPFGAAHI LMXM.WFGNFWCE FWTSI 0 



FITS LAC? [> 



FLLSLACA 3 .IIGTFSMNLYTTYL 



FLFSLAO > 
FLLSLACA 0 
FLFSLGCff D 
YLLSLACA D 

LIVSLAV* [> 



S-Hydroxytryptamine Human 5-WTla 
Rat 5-HTic : 
Rat 5-HT2 



ivMGLAW PFCASH I LMKm! WNFCNFWCE FVTSI D rLCVTASIETLCVIAV D ( 

LWLAL 0 'VASNASVHNLLLISF D t 
LWLAL 0 rVVSNASVMNLLIISF D I 
LWLSI D rVASNASVMNLLVISF 5 I 
LWLAI D rVVSNASVMNLLIISF D 
LWLAL D rVASNASVMNLLVISF 0 



LM6H 

*I IGVFSMNLYTLYT VIGY 
IIGVISMNLTFTYI IMNR 
I IGAFSMNLYTLY I IKGY 
IIGIFSMNLYTTYI LMGR 



WALCTLACD 
WPLCPWCD 
WALGNLACD 
WPLGAWCD 
WVLGSLACD 



LVATLVMPWWYLE WCE.WKFSRIHCO IFVTL 7 



LICSLAVT D JfVSVLVLPMAALYQ 
FLMSLAIA D 1LVGLLVMPLSLLAI 
FLMSLAIACHLLCFLVMPVSMLTI 

- Transmembrane P 



VLNK-WTLGQVTCO 
LYDYVWPLPRYLCP 
LYCYRWPLPSKLCA IWIYlfe 



rLCVTASIETLCVIAV D t 



f MMCTAS I LNLCA I S I D I 



LFIAL D rLCCTSSILHLCAIAL D ( 



VWISLP rLFSTASIHMLCAISIX> I 
'LFSTASIMHLCAISlfc I 



Loop 



Transmembrane 111 



Fig. 4. Conservation of aspartate residues in TM2 and TM3 among membeii of tbe biogenic amine receptor subfamily. The numbering and location of 
(be conserved aspartate residues are depicted using a model of the human atpha^^adreneipc receptor. References for the sequence data can be found 
in reference 1. (From: Wang, C-D- Buck. MJV. and Rasa; CM. Mol Pharmacol 1991. 40: 16S-79; reproduced with permission.) 



electrostatic interaction with the cation* 
ic amine moiety of their respective li- 
gands. 

Ser residues in TM5 are conserved 
as a pair (Ser-X-X-Ser motif) among 
biogenic amine receptors that bind cate- 
cholamines but not in those receptors 



whose endogenous ligand lacks a cate- 
chol moiety (eg., acetylcholine) (Rg. 
5). Stmctore-function analysis of tbe 
p2 AR has implicated the hydroxy! side- 
chain of Sex204 and Ser207 in hydro- 
gen bond formation with the mefo- and 
/wo-bydroxy! groups of catechola- 
mines. 49 Substitution of either Ser resi- 



due with alanine (Ala) attenuates the ac- 
tivity of catecholamine agonists at the 
mutant receptors. Tbe effects of these 
mutations on agonist activity can be 
miinicked by the interaction of meta- 
and />a/^hydroxyl-siibstituted analogs 
with tbe wild-type receptor. Hence, at 
the [Ala204]02AR mutant, isoprotere- 
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Human PjAR 
Human 02 AR 
Human a 2A AR 
Human a 2 (C-4) 
Hamster o^AR 
Rat D 2 DR 
Dros octop 
Rat 5H7-1A 
Rat 5HT-1C 
Human ml 
Human m2 
Human opsin 



AYAIASSWSFYVPLC IMAFVYL 
AYAIASSIVSFLVPLVIMVFVYS 
WYILSSCIGSFFAPCLIMGLVYA 
WYILSSCIGSFFAPCLIMGLVYL 
FYALFSSLGSFYI PLAVILVMYC 
AFWYSSIVSFYVPFIVTLLVYI 
GYVIYSSLGSFFIPIAIMTIVYI 
GYTI YSTFGAFYI PLLLMLVLYG 
NFVLIGSFVAFFIPLTIMVITYF 
I ITFGTAMAAFYLPVTVMCTLYN 
AVTFGTAIAAFYLPVI IMTVLYW 
SFVI YMFWHFI I PLI VIFFCYG 



A. 


B 








# ^ i w s>rW f vn ) 


Blpha^ -Adrenergic Receptor 


Beta^-fldrenerglc Receptor 



Fig. 5. Conservation of serine residues in TM5 among certain members of the biogenic amine recep- 
tor subfamily. Top: Alignment of the deduced amino acid sequences from TM5 of selected G-pro- 
tein— coupled receptors. Amino acid sequences were aligned to maximize homologies within this re- 
gion. *. conserved serine residues in a Scr-X-X-Ser motif. References for the sequence data can be 
found in reference 1 . Bottom: Model comparing the ligand binding site of the alpfaa^-adrenergic (A) 
and beta-adrenergic (B) receptors. The view of the receptors is from the extracellular face of the plas- 
ma membrane. The seven alpha-helices are numbered I-VH. Locations of the conserved aspartate 
(Asp) and serine (Ser) residues implicated in ligand binding are indicated. Ligand binding model for 
the beta-adrenergic receptor has been adapted from reference 48 or 51. (From: Wang^C.-D., Buck. 
M.A. and Fraser, CM. Mol Pharmacol 1991. 40: 168-79; reproduced with permission.) 



nol and its mira-substituted analog dis- 
play only partial agonist activity, 
whereas the para-substituted analog ex- 
hibits no intrinsic agonist activity. Con- 
versely, isoproterenol and its para-sub- 
stitnted analog show partial agonist 
activity at the [AlalGTjfcAR mutant, 
but the m£ta-substi luted analog is de- 
void of activity. 

In a somewhat analogous manner to 
that of the ( Ala204)fc AR mutant, when 
Ser204 is substituted with Ala in the 
CC2AAR, epinephrine and phenylephrine 
(m*rta-substituted)elicit 100% maximal 
agonist activity at the mutant receptor, 
whereas synephrine (para-su bsti tu ted) 



displays only partial agonist activity. 38 
Based on these findings, it was postu- 
lated that Ser204 in the o^aAR func- 
tions in a manner similar to that of 
Ser207 in the fcAR by participating in 
hydrogen bond formation with the 
para-hydroxyl group from the catechol 
ring structure of catecholamine ago- 
nists. There exists a second Ser residue 
four amino acids upstream from Scr204 
in the c^aAR (Fig. 5). Mutation of this 
residue at position 200 of the a^AR 
produces a mutant receptor that is fully 
activated by epinephrine, phenyleph- 
rine and synephrine. 38 Thus, Ser200 ap- 
pears not to directly participate in the li- 
gand binding process. This finding is 



not totally unexpected, since Ser204 
and Ser207 in the fcAR are located 
three positions apart in TM5, which is 
presumed to form an cx-helix, compared 
with a distance of four residues apart for 
Ser200 and Ser204 in the a^AR. Since 
one turn of an a-helix encompasses 3.6 
amino acids, the hydroxyl group of 
Ser200 in the a^AR would assume a 
different orientation in the helix com- 
pared with Ser204 in the feAR. Thus, it 
is possible that the me/a-hydroxy I group 
of catecholamine agonists interacts 
with the sulfhydryl side-chain of Cys at 
position 201 of the a^AR, which is lo- 
cated in the same relative position in 
TM5 as Ser204 of the foAR (Fig. 5). 

C-terminal domain and the 
intracellular loops 

It has been widely presumed that the 
cytoplasmic loops of GPCRs form an 
interface between the receptor and G- 
protein. Several lines of evidence, in- 
volving both biochemical and genetic 
approaches, now lend support for this 
hypothesis. Findlay and Pappin 50 re- 
vealed early on that proteolytic diges- 
tion of i3 of rhodopsin abolished its in- 
teraction with the G-protein transducin, 
thus implicating this domain as the ma- 
jor constituent involved in the coupling 
process. This finding has been extended 
to the biogenic amine receptor subfami- 
ly through the use of deledon and site- 
directed mutagenesis. When a large 
33-amino-acid deledon (residues 229- 
258), corresponding to the middle seg- 
ment of i3, is performed on the hamster 
02 AR, no detectable affect on the ability 
of the receptor to stimulate adenyly cy- 
clase was seen. 51 However, deletion of 
the amino- (222-229) and carboxyl- 
(25S-270) terminal portions of this loop 
caused marked reductions in agonist- 
dependent stimulation of adenylyl cy- 
clase. 51 These two short peptide seg- 
ments are believed to form amphipathic 
helices that interact with G, during the 
process of receptor activation. 

Several mutations made by 
O'Dowd et al. 52 indicate that other re- 
gions on the fcAR protein, besides 
portions of i3, may contribute to re- 
ceptor coupling. Deletions in il and i2 
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produced mutant receptors with re- 
duced capacity to couple to G s and 
stimulate adenylyl cyclase, thereby 
presupposing a role for these two 
loops in receptor-G-protein interac- 
tions. Furthermore, mutation of a con- 
served Cys residue (position 341) in 
the cytoplasmic tail of the foAR un- 
paired the ability of isoproterenol to 
stimulate adenylyl cyclase. 53 Cys341 
undergoes palmitoyiation and the fatty 
acid moiety is proposed to insert itself 
into the lipid bilayer, thus creating an 
additional cytoplasmic loop. The ami- 
no- terminal segment of this "fourth" 
intracellular loop is speculated to play 
a role in the coupling of the PAR to 
G s , presumably by maintaining proper 
orientation of the . other G-protein 
binding domains. 53 

Data obtained on glycoprotein hor- 
mone receptors support the general no- 
tion of multiple intracellular regions 
participating in the coupling process. 
Site-directed mutagenesis of the thyro- 
tropin receptor provides evidence on the 
importance of il and the,carboxyI-ter- 
minal portions of both i2 and i3 in signal 
transduction. 54 In contrast, deletion of 
two thirds of the carboxyl-termirial end 
of the cytoplasmic tail does not func- 
tionally impair the thyrotropin recep- 
tor. 54 It is not known with certainty 
whether the remaining amino-terminal 
portion of the tail, like in the foAR* 53 is 
important in receptor-G-protein cou- 
pling. 

Chimeric receptors have been con- 
structed to identify the intracellular re- 
gions important for defining selective 
receptor/G-proteins interactions. Stu- 
dies with chimeric ml/m2 or m2/m3 
mAChRs indicate that i3 is sufficient in 
determining jhe selective coupling of 
these receptor subtypes to their respec- 
tive effector enzymes. 55 - 56 Similar 
findings have been reported for chime- , 
nc a 2 /pr and fo/ai -ARs. 57 - 58 Howev- 
er* it is likely that multiple cytoplasmic 
domains are required for G-protein 
binding specificity. Wong et al. 59 have 
shown that substitution of a 12-amino- 
acid segment (in the amino-terminus of 
i3) of the Pi AR into the corresponding 
position of the mlmAChR is enough to 



confer Gj, without disturbing Gp, cou- 
pling to the latter receptor: Only upon 
additional substitution of the corre- 
sponding i2 domains was G p coupling 
to the mlmAChR abolished, 59 Hence, 
these data demonstrate the pivotal, al- 
though not exclusive, role of i3 in selec- 
tive effector coupling. 

Concluding remarks 

The rapid proliferation and identifi- 
cation of newly cloned GPCRs reveal a 
much greater diversity within this su- 
pergene family than was previously 
considered at the pharmacological lev- 
el. As more receptors are cloned, the use 
of site-directed mutagenesis in conjunc- 
tion with molecular modeling tech- 
niques will help better define the func- 
tional domains of these proteins. 
Ultimately, it is this knowledge which 
will form the basis for the development 
of future therapeutics. 
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The Cure: With Big Drugs Dying, Merck Didn't 
Merge 

It Found New Ones 

Some Inspired Research, Aided By a Bit of Luck, 
Saves Company's Independence 

The Path to a Novel Painkiller 
By Gardiner Harris 
Staff Reporter of The Wall Street Journal 

For 1 5 years, Edward Scolnick, head of Merck & 
Co.'s drug research, knew the company would be 
facing a crisis about now. For much of that time, he 
secretly feared that Merck might not survive it as an 
independent company. "I had some doubts that I 
didn't share with anybody," Dr. Scolnick says. 

Merck's problem, which at times has infected almost 
every big pharmaceuticals company, was that patents 
on several of its best-selling drugs would be expiring. 
Generic knockoffs would then eat deeply into market 
share and profits on drugs like Vasotec and Prinivil 
for hypertension, Mevacor for high cholesterol and 
Pepcid and Prilosec for ulcers. 

Ever since investors caught on to this, Wall Street 
has been insisting that Merck join the merger rush 
sweeping the phamaceuticals industry. But its chief 
executive, Raymond V. Gilmartin, steadfastly 
refused, insisting that Merck could grow briskly all 
by itself. 

He was right. Today, well into what was supposed 
to be the crunch, Merck is riding high. It topped all 
its peers in revenue growth last year, and most of 
them in earnings growth, analysts say. Its stock 
surged 26% in 2000 while the broad market was 
skidding. Instead of facing an acute need to save 
money, Merck is increasing its research spending by 
nearly 17% and its sales force by almost a third. 

"The safe thing would have been to seek a merger, 
emphasize generics, stay diversified and cut costs 
across the board," Mr. Gilmartin says. "We went 



against the conventional wisdom at the time, stayed 
with it and did it." 



Mr. Gilmartin, 59 years old, who arrived at Merck 
after heading a medical- device company, gambled 
that the pharmaceuticals giant's tradition of creativity 
and innovation in drug discovery would bail it out. A 
merger, by contrast, would dilute the power of this 
science-based culture -- one that has been a model for 
other drug companies — and be a distraction for 
years. 

Had he and his lieutenants been wrong, Merck's 
name might have wound up in the same graveyard as 
Warner-Lambert, Upjohn, Syntex, Sandoz, Ciba- 
Geigy, Rhone-Poulenc and Hoechst, all of which had 
to resort to mergers when their labs couldn't produce 
enough new drugs to replace old ones with expired 
patents. 

Merck's success demonstrates that in the drug 
business, as in Hollywood, one big hit can sway the 
fate of an entire company. And searching for 
blockbuster drugs is a matter of inspiration, scientific 
instincts and shrewd management -- assets that are 
hard to buy in a merger. 

In this case, the inspiration came from Peppi Prasit, 
a Thai-born medicinal chemist for Merck in 
Montreal. In July 1992, he found himself wandering 
around an obscure medical conference in that city. 
Chatting with a colleague, he learned that Merck 
researchers had developed a lab test to determine if a 
painkiller was less likely to cause the stomach upset 
that goes along with most pain and arthritis drugs. 

Moments later, Dr. Prasit, now 45, noticed a poster 
display from some researchers for a Japanese 
company claiming they had produced just such a 
nonirritating painkiller, though one that wasn't 
chemically fit to try in people. Dr. Prasit immediately 
went back to his Montreal lab, cooked up the 
mysterious molecule and put it to Merck's new test. 
When it passed, he set about trying to create a similar 
drug for humans. 

His work caught the eye of Dr. Scolnick, the 
research chief at Merck's sprawling laboratory 
northwest of Philadelphia. Dr. Scolnick, 60, is a 
former molecular biologist for the National Institutes 
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of Health who joined Merck in 1982 and became its 
top scientist three years later. A dour and fierce man, 
he works out of a small office with a view of 
industrial pipes and a door so narrow he has to slide 
sideways to get through. His adjoining conference 
room is decorated only with two bedraggled plastic 
plants. 

The unimpressive surroundings belie the critical 
nature of his job: Dr. Scolnick monitors hundreds of 
intriguing scientific leads floating around Merck's 
labs and decides where the company will make its big 
bets. Merck's winnowing process has evolved over 
three decades into committees of scientists who 
discuss one another's work with brutal frankness. It's 
a system of peer review modeled on one used at the 
NIH, and it "allows a really good debate about what 
we should be doing," says Dr. Scolnick. 

The system has helped Merck, in recent years, to 
bring out new drugs like Fosomax, to slow bone 
deterioration in osteoporosis; Singulair for asthma; 
and big-selling medicines for high blood pressure, 
glaucoma, migraine and AIDS. But Dr. Scolnick says 
he didn't need a committee to tell him that Dr. Prasit's 
painkiller project had the potential to be a 
blockbuster, and a critical bridge out of Merck's 
patent problem. 

The class of painkillers called nonsteroidal anti- 
inflammatory drugs - like aspirin and ibuprofen - 
hadn't seen a major improvement in years. And 
thousands of Americans suffered ulcers each year 
because of the drugs' side effects. Preventing that 
would clearly be a huge advance. 

These drugs attack the inflammation that leads to 
pain by curbing production of prostaglandins, 
compounds that marshal the body's defenses. But 
prostaglandins also are involved in making the lining 
that protects the gut from digestive acids. The more a 
painkiller inhibited inflammation, the more it thinned 
the protective lining, increasing the risk of bleeding 
ulcers. 



Philip Needleman, a pharmacologist at Washington 
University in St. Louis, had mapped out a potential 
way around this. It would be a drug that inhibited 
Cox- 2, an enzyme that regulates prostaglandin 
production in most of the body, but not Cox-1, a 
similar enzyme involved only in the gut. Dr. Prasit 



knew of this research and was determined to develop 
just such a drug, especially since Merck had a test for 
it. 



But in this quest, Dr. Prasit and Dr. Scolnick feared 
they were in a race — and running second. Rumors 
swirled that Dr. Needleman, who subsequently 
crossed town to join Monsanto Co., was working on a 
similar drug for that company. 

So Dr. Scolnick ordered researchers in Montreal to 
pursue Dr. Prasit's work as fast as they could. "I 
would call up every other day and say, 'Hey, is 
everybody working on this project?' " recalls Dr. 
Scolnick with a rare smile. "They would always say, 
'Yes!' You don't know if they're telling the truth, but 
they got the message that it was important." 

Dr. Prasit's team synthesized hundreds of 
compounds, some of which worked great in the test 
tube but passed through laboratory mice with no 
clinical effect. Others mysteriously killed the mice. 
But by October 1994, the team had come up with two 
compounds that aced the test-tube tests and didn't 
hurt the mice, even at extremely high doses. 

Normally, Dr. Scolnick would have chosen one of 
these to put through the expensive and risky process 
of testing in humans. But the project was so 
important - and Merck appeared to be in such a 
high-stakes competition with Monsanto -- that he 
decided to put both compounds in clinical trials. 

It was a good move, because only one of the two 
ended up working. "One failed and the other didn't, 
and there was no way you could have looked at the 
preclinical data and predicted which one would 
succeed," Dr. Scolnick says. "That's just dumb luck." 



Meanwhile, Mr. Gilmartin, arriving in 1994, had 
other headaches. Growth at the company, so stellar in 
the late 1980s, had slowed. A health plan proposed 
by the Clinton administration threatened price 
controls. Powerful managed-care organizations were 
demanding deep discounts. "There were people that 
were questioning, in a managed-care environment, 
what was going to be the value of breakthrough 
research," the CEO says. "Merck, in fact, had even 
moved into the generics business. Everything was 
being questioned and challenged." 
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Among his first moves was to squelch the push into 
generics and sell off specialty-chemicals and 
agricultural subsidiaries. He also ordered his 
managers to make peace with managed-care 
companies. 

Merck had been fighting their demands for 
discounts, with the result that its products were 
increasingly being excluded from the "formulary" 
lists of large buying groups. "Merck was just in your 
face. If you tried to set up a meeting with them, they 
would refuse," says Lynn Detlor, president of 
American Healthcare Systems' Purchasing Partners 
LP, a huge group-purchasing organization. In Mr. 
Gilmartin's first week in 1994, he set up a meeting 
with that group's executives and promised that Merck 
would cooperate at every level, say two participants 
at the meeting. Within 18 months,, the managed-care 
group had increased its purchases of Merck products 
tenfold on an annualized basis, Mr. Detlor says. 



But most important, Mr. Gilmartin decided then to 
bet the company's future on the productivity of its 
labs. "Shortly after he came here, he came to one of 
our research meetings and stayed for dinner," Dr. 
Scolnick recalls. "As he was leaving, on the way out 
he said he wanted to talk to me. And he said, 'I want 
you to know that I have complete confidence in you. 
Just do your thing and I'm not going to bother you.'" 

That meant he had freedom to throw all the 
resources he wanted into Dr. Prasit's project in 
Montreal. 



In January 1995, Merck handed a batch of a 
potential new pain drug to Donald R. Mehlisch, an 
oral surgeon in Austin, Texas, who tests such 
medicines for manufacturers. He recruits students 
from the University of Texas, yanks out their wisdom 
teeth, gives them a pill and puts them into a dorm 
attached to his clinic to watch them suffer. "We 
create a lot of pain in what we do," he says 
cheerfully. 



A test drug's effectiveness is measured largely by 
how long it takes patients to insist that what they 
were given isn't working and they need something 
else. The tests are designed to be "double-blind," with 
neither doctor nor patient knowing which pill is 



which. Even so, Dr. Mehlisch sensed that what he 
was testing for Merck had potential. It was "the first 
time we've ever had a compound that has worked so 
well for so long," he says. 

Meanwhile, Monsanto, which was working on a 
similar drug just as Merck had suspected, ran its 
candidate through similar dental-pain tests. It failed 
them. However, it and Merck's drug were both good 
at relieving the longer-term pain of arthritis, without 
stomach irritation. 



To Merck's dismay, Monsanto completed its clinical 
studies first. Among the reasons was a dosage glitch 
at Merck. The company figured out only belatedly 
that, instead of as much as 1,000 milligrams, the 
proper dose was 12.5 mg. to 25 mg. The pills that 
resulted were so tiny that Merck was afraid arthritis 
patients wouldn't be able to pick them up. It enlarged 
them with edible filler, but that caused another 
problem ~ the filler turned out to slow the drug's 
absorption. Three months were lost while researchers 
worked to fix all this. 



On the last day of 1998, the Food and Drug 
Administration gave Monsanto approval to market its 
nonirritating painkiller, called Celebrex. In February 
1999, Monsanto began co-marketing it with Pfizer 
Inc. — and it quickly became the most successful 
drug launch in U.S. history. Merck still didn't even 
have marketing approval. 

Normally, a head start like that makes the first drug 
dominant and very hard to catch. Yet the way Merck 
handled its later launch would soon put its drug, 
called Vioxx, hot on Celebrex's trail. One reason: an 
expanded role for marketers within Merck. 

For decades, Merck's marketers hadn't been allowed 
anywhere near scientific- planning meetings. Mr. 
Gilmartin's predecessor, P. Roy Vagelos, started to 
change this, persuading scientists to accept marketers 
in their midst by promising that they wouldn't speak. 
Then, speaking was allowed but not encouraged. 
Under Mr. Gilmartin, however, the marketers have 
become deeply involved in many of the scientists' 
development decisions, though they still have no 
involvement in early-stage research issues. 

Mr. Gilmartin created teams of marketing, 
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manufacturing and research people that now plan far 
ahead. "We deconstructed every task to see where we 
could cut out steps," says Wendy Dixon, a marketing 
vice president who oversaw the Vioxx launch. "We 
carved four or five weeks off the normal product- 
launch process." 

While the team made thousands of bottles and boxes 
in advance, they couldn't do that with the pills' 
instruction flier; the FDA doesn't bless it until the day 
of approval With the rival drug already a hit, Merck's 
challenge would be to get the approved copy from the 
FDA to print shops across the U.S. and Puerto Rico, 
print the fliers by the thousands, insert them with the 
pills and get the bottles to pharmacies in just days. 
Planes were placed on standby in case printing plates 
needed to be rushed to print shops elsewhere. 



Insiders knew that May 20, a Friday, was likely to 
be the day of FDA approval. Cheryl Ramsey- 
Weldon, the company's top formatter of instruction 
fliers, waited all day on tenterhooks. When her shift 
ended she went home and waited some more. "I was 
sitting by the phone," she says. "1 called [her 
supervisor] three times to see how close we were." 

She finally got the call at 10 p.m. Ready for bed 
with her contacts out, her hair up and her pajamas on, 
Ms. Ramsey- Weldon jumped into her car as she was 
and raced the 3 1/2 miles to the plant. Four hours 
later, she had formatted the document and passed it 
along to a pair of proofreaders. At 2:30 a.m., she 
went home for a few hours' sleep. By 6 a.m. she was 
back for more. 



Merck's presses ran for days without stop. Then the 
fliers were folded and inserted. The bottles reached 
distribution centers on Monday afternoon. Vioxx was 
stocked in 40,000 pharmacies within 11 days of 
approval, a remarkable feat. 

Within three months of its launch, the Merck drug 
gained nearly a third of the brand-new market for 
"Cox-2 inhibitors," according to research firm IMS 
Health, and within a year it had nearly half. In 
Europe, Vioxx is dominant, having beaten Celebrex 
to market in most countries despite filing later. 
Helping Vioxx in the heated two-way competition: It 
acts more quickly than Celebrex and is more 
selective for the Cox-2 enzyme, according to 
independent studies. 

Both drugs have flaws, though. And now Merck is 
locked in another Cox-2 contest, racing Pharmacia 
Inc. — which took over Monsanto ~ to bring out 
second-generation, improved versions of the hot- 
selling drugs. 

These days, Mr. Gilmartin has an uncharacteristic 
swagger. "We're going to another level at a time 
when most worried that Merck wouldn't even 
compete," he says. But Dr. Scolnick gives plenty of 
credit to the way things broke Merck's way during 
Vioxx's development. "If those first two compounds 
had failed [in human trials] and we had had by 
chance to rely on the fifth or sixth one" years later, he 
says, "we would be a very different company." 



Prescription for Success 
Merck is gradually losing its exclusive rights to these drugs . . . 

1999 sales 1999 sales 

Drug U.S. World-wide 

(condition) (in millions) (in millions) Expiration* 

Vasotec (Hypertension) $975 $2,300 August 2000 

Mevacor (Cholesterol) $480 $600 December 2001 

Pepcid (Ulcers) $820 $910 April 2001 

Prinivil (Hypertension) $715 ~ $815 June 2002 

But the company has drugs in the pipeline 
Launches expected in 2001 

-- Cancidas: intravenous anti-fungal drug; application submitted to 
FDA July 2 000 

-- Invanz: intravenous antibiotic; application submitted to FDA 
November 2000 

Eterocoxib: Super Vioxx for arthritis; application to FDA expected 
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early 2001 

*End of exclusivity 

NOTE: In November, the patent will also expire on Prilosec, an 
Astra-Zeneca drug for heartburn and gastro-esophageal reflux disease, 
from which Merck receives considerable revenue. 
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